Knowledge

We help make your business a success

Tech Blog: Ensuring our services are always available

At Crunch, we’re creating a computing platform for our services, and we’re using AWS Auto Scaling groups to ensure they’re always available.


Auto Scaling groups


Auto Scaling groups have some really neat features, including the ability to perform rolling upgrades. This means that we can update our services without stopping them.


However, just occasionally, something can go wrong.


Sometimes a new instance just fails to start cleanly. That’s OK, it happens. And AWS is smart enough to fix it by rolling back to the old instance configuration. But this introduces a new problem: there’s no real way to know the order in which AWS will stop the service instances.


Say we have a cluster of three instances of a service, and we need to keep two of them running all the time to maintain a quorum. We then start a rolling upgrade, and the new instance fails to start, which is OK as we still have two running and the cluster is still available.


But AWS can terminate any of the three instances. If one of the remaining two ‘good’ instances is stopped, we no longer have a quorum and the service is no longer available.


Our solution


We decided to fix this. We’ve written our own software to make sure that we keep a working cluster during a rolling upgrade, even in the unlikely event that something goes wrong.


Our “asg_rolling_upgrade“ software iterates over the instances in the auto scaling group, stopping each instance with the old configuration and replacing it with a new instance with the new configuration. But we control the order of stopping the instances; our code works from the oldest running instance to the newest.


If an instance doesn’t start cleanly or fails to join the cluster, the upgrade stops and one of our SysAdmins can investigate. Meanwhile, we still have a safe working cluster.


Give it a whirl


We’re feeling pretty pleased with ourselves over this, so decided to share what we’ve done. If you want to take a closer look, our asg_rolling_upgrade is available on Github.


Feel free to give it a try for yourself!


Trevor Marshall is Platform Lead at Crunch. His software experience ranges from Java development to real-time simulation systems. Away from the computer, Trevor is a Morris dancer and has just acquired a classic campervan.


Want to get involved? We’re hiring!

Our invoice templates are professional and sharp. Use them to directly invoice your clients and get paid fast.

From understanding expenses to starting a limited company, our downloadable business guides can help you.

If a client hasn't paid an invoice, download our late payment reminder templates and get that invoice paid fast.

We’re playing host a ProductTank Brighton meetup, in which three experienced speakers will be sharing experiences & wisdom about what can be missed.

Having a common language the whole team uses is essential. SGDD makes it easy to explain the way we work to each other & those outside of Crunch.

Arriving at conclusions took an unnecessarily long amount of time, so we started embracing the principles of Atomic design.

The best accounting advice

Our accredited team are on hand to help you choose the best package

We understand that it can be difficult deciding whether or not to switch accountants, but at Crunch we’ll offer you fair, unbiased advice on what’s best for you.