Online services have to be reliable to succeed (with the possible exception of Twitter), and we’re incredibly proud of our reliability record. In our three year lifespan we’ve only had one significant outage, and even that only lasted a few hours.
However, you don’t maintain an exemplary reliability record by resting on your laurels. We’ve been cooking up a major overhaul of Crunch’s technical underpinnings for several months now, and last night we rolled it out.
The headline from a technical perspective is that we’ve moved from a dedicated static host to Amazon EC2. The main advantage of EC2 is that it is a fluid, scalable platform, so if we need more processing power to keep everything chugging along smoothly we can simply turn it up.
Our new infrastructure looks like this (although this is a little simplified) –
We now have two web servers and two app servers where before we had one of each (for those wondering, the web server is the part that displays all the user-facing stuff, and the app server is the part that does all the number-crunching), and we’ve made sure the work is split evenly between them with a load balancer.
We’ve also prepared third app and web servers ready for whenever they’re needed – although as our new infrastructure is several times more powerful than our previous setup we’re estimating they won’t be turned on for a while yet.
We’ve also brought in a clustered file server, which will be used to store all the documents Crunch generates (Interim accounts, tax returns etc.) and all the expense records uploaded through the web or Snap.
As well as our new scalable infrastructure we’re also introducing some new levels of redundancy. Our new system will snapshot and back up user data every five minutes within EC2, and we’re also replicating these backups with another secure host as part of our disaster recovery solution.
A little more technical information. All our hosting is located within the EU and complies with the EU Data Protection Directive (sometimes called the “EU Safe Harbour rules”), and our EC2 hosting is spread across multiple availability zones. This means that if one availability zone should suffer an outage (which happened on the East Coast of the US last year) our load will simply be distributed elsewhere.
From a user’s perspective there will be no change at all. Crunch will still work the same as it always has – with this upgrade we’re simply laying some sensible groundwork for future growth, and making our infrastructure a little more streamlined to make life easier for our technical team.