Upgrading Your Backend Software: Software Hacks

Sean Chaney
cardimage

Evolving your product isn't easy. Best case, nothing breaks, users love it, and you high-five until your hand hurts. Worst case, Digg v4.

But it doesn't have to be hard. There are ways to make the transition smoother. For instance, Adzerk recently migrated from Reporting Backend 1.0 to 2.0 without any major incidents or late nights. How'd we do it? Read on!

Background

Here's a technical diagram of what our Reporting 1.0 Backend looked like:

fire
fire

Joking aside, our 1.0 backend had reached its scalability limits, making it a constant source of firefighting. Not particularly liking this, we assembled a team to rewrite it, and soon we had a shiny new Reporting 2.0 Backend. The next step was decommissioning 1.0, but we didn't want to switch it over without further testing. This is where we used our good friend, The Four Step Process™:

fire
fire

The Four Step Process™

  • Step 1 - Write to both places.

  • Step 2 - Read from the new place, fall back to the old.

  • Step 3 - Stop reading from the old place.

  • Step 4 - Stop writing to the old place.

"Place" here refers to an address for data. It could be database columns, cache keys, a variable name, a function call, a path on a filesystem, a URI, or just about anything!

Four Step Example

Say you have a system that writes a name value that is first name and last name concatenated. Well, that's no good; we want first-name and last-name written individually. This will speed up our alphabetizer service tenfold. Time to 4-step this bad boy!

flash
flash

Step 1

We write to the new places of first-name and last-name, while still writing to 'name'.

Step 2

Alright, the new data looks legit, so we start reading from first-name and last-name; if they don't exist, we fall back to reading from name. No problem.

Step 3

Let's say our data is not ephemeral and we care about keeping old values. To stop reading from the old place, we need to modify the old data. We do so, making sure there is a first-name and last-name for each name.

This step can be tricky, which is why it's important that we're still writing to name.

Step 4

Step 3 has survived production for some time, so we stop writing to name and then celebrate total victory.

victory
victory

Real World Example

While the above example was a real rollercoaster of emotion, below is how we actually used the Four Step Process™ to replace our backend infrastructure.

Step 1 - Write to both places

Step 1 was building the Reporting 2.0 Pipeline, which is a story for another post.

Step 2 - Read from the new place, fall back to the old

At this point, we hadn't load-tested our new Redshift cluster. It could ingest data, but would it survive our users' queries? To test this, we:

  • Built a 1.0->2.0 request translation service, as well as a 2.0->1.0 response translation service.

  • Wrote code to put incoming requests behind a feature flag, letting us safely stress-test our Redshift cluster and compare data.

  • Slowly added customers until all reporting queries were running successfully against the new backend.

  • Gave beta testers the option to use the translated 2.0 output, with 1.0 output as the default option. Our beta testers gave us great feedback, and on a per-customer basis we changed the default to 2.0.

Step 3 - Stop reading from the old place

By this point, we had ironed out 99% of the kinks. We were confident enough to roll out a final feature flag: stop sending requests to 1.0. Woo!

However, as ready as we were to kill 1.0, we were still ingesting data into it...juuuuust in case. Patience is hard here, but so so important.

Step 4 - Stop writing to the old place

A few months more passed without any major incidents. So, we terminated our oldest ec2 instances. Reporting 1.0 was officially dead. It felt good, but also somehow sad. That said, we overcame our grief pretty quickly after seeing the next month's Amazon cost graphs.

Final Thoughts

Burning down our Reporting 1.0 Backend was a fun and stressful adventure. We expected many late nights, but, surprisingly, the whole process went smoothly. I attribute this to how much time we spent planning our four steps (as well as having the patience to not immediately switch over).

So, next time you're in a position like this, consider using the Four Step Process™!

Join the Ad.Product community

Sign up for our upcoming newsletter and to be notified of our Ad.Product Slack channel and conference.

Ad.Product is the first community for product managers, engineers, and others to discover and discuss how to build innovative, user-first ad platforms.

Sean Chaney

Recommended Articles