There are a number of popular cloud providers, but migrating your data from one to another can be daunting.
At Adzerk, by default our Data Shipping feature (which gives customers full logs of all ad serving requests) stores info in an AWS S3 bucket. But sometimes our customers request GCP (Google Cloud Platform) instead.
For these clients I built a tool to accomplish this - and I thought I'd share publicly, as it's abstract and powerful enough to be used by anyone migrating data from S3 to GCP.
Weighing the options
I started by weighing three data shipping options:
Lambda Function triggered by a CloudWatch Scheduled Event
While simple, this option required re-syncing every time I needed near real-time data in GCP during the day - or every night, if I were holding analysis for the following day.
Lambda Function triggered by an S3 event
This option was slightly more complicated, as it required having a trigger on an existing S3 bucket. It would be called any time a file was put in the data shipping bucket.
Lambda Function triggered by a CloudWatch Scheduled Event with SQS
This was the most complex option. Data was slow to arrive in GCP, and I wasn’t sure how it would handle failure.
Deciding on a solution
After weighing these options, I decided to proceed with an S3-triggered Lambda job, as it offered the best balance, the ability to handle scale at a reasonable price, and the ability to ‘upgrade’ if needed.
Authenticating in GCP
This step required a service account with limited permissions - similar to the IAM role for AWS. I had to embed JSON credentials in the Lambda package prior to upload.
Weighing the deployment options
Next, I weighed the deployment options - CloudFormation or Serverless.
CloudFormation is a simple, built-in option. It features native AWS tooling but requires pre-packaged deployment assets, and it doesn’t have native support for S3 events on existing buckets.
Serverless is a Cloud Agnostic Framework for developing and deploying cloud applications. It also includes handy tools for managing deployments including different environments. It’s easy to use with an existing S3 bucket, but it requires a third-party application to deploy it.
Deciding on a deployment solution (with a workaround)
You, too, can you flip your data shipping from AWS to GCP by following these 10 steps.
10 steps to replicating and shipping your user data from AWS to GCP:
AWS to GCP set-up: 6 steps
Step One: Ensure that you have the Make and Zip command line tools installed.
On MacOS, you'll need to install the XCode Command Line Tools using:
xcode-select --install and Zip via Homebrew using:
brew update && brew install zip
If you're using an Ubuntu-based Linux distribution, you can run
sudo apt update && sudo apt install zip build-essential.
Step Two: Next you'll need the AWS CLI. You can find installation and configuration instructions here.
Once installed and configured, you can run
aws s3 mb s3://your-globablly-unique-lambda-source-bucket-name to create an S3 Bucket to store your Lambda source bundles
Step Three: Enable and configure data shipping for your network
Step Four: Now we need somewhere for the data to go. If you already have a GCP Storage bucket created, great, you can skip to creating the Service Account Key. If not, go ahead and create one in your account.
Step Five: Access or create your Service Account Key, which the Connector will use to write JSON files to your bucket (see note below)
Step Six: Save the key locally and set the Google Application Credentials environment variable to the path of the JSON file (You’ll need to remove any spaces in your JSON token filename for the make tasks)
NOTE: For Adzerk customers, data shipping may re-write files using the same filename. To create new files in Google Cloud, grant the Storage Object Creator role when generating your token. You can allow the Connector to overwrite files by designating it Storage Object Admin.
AWS to GCP installation: 4 steps
Step One: Gather the following values for deployment:
- Stack name
- Lambda source bucket (to create S3 events on existing S3 buckets)
- Source bucket
- Destination bucket
Step Two: Start your deployment
Step Three: Create the stack:
make create STACK_NAME=your-stack-name \ LAMBDA_SOURCE_BUCKET=your-source-bucket \ SOURCE_BUCKET=your-adzerk-data-bucket \ DESTINATION_BUCKET=your-gcp-storage-bucket
Step Four: Update the stack:
make update STACK_NAME=your-stack-name \ LAMBDA_SOURCE_BUCKET=your-source-bucket \ SOURCE_BUCKET=your-adzerk-data-bucket \ DESTINATION_BUCKET=your-gcp-storage-bucket
Results: New architecture and functions
The Google CloudFormation template in AWS will create new Lambda functions and infrastructure:
- New S3 events on existing S3 buckets
- New S3 events for your data shipping bucket
- New functions and S3 events to ship new files from S3 to GCP
- All necessary IAM permissions
Once your new Google Cloud Storage bucket is installed, you can also set triggers to alert new data and execute further processing logic.
Want to learn more?
View our GitHub guide, and share your data migration experiments and experiences in the comments below. We’d love to hear what’s working well for you - and what other topics you’d find helpful.
Questions or feedback about Adzerk data shipping? Your Account Manager is always glad to hear from you!