Deep Dive Into Data
One of the current API projects we’re working on involves creating a deeper visibility into the data of our customer’s use of our SDK services (because data visualization can kick some serious ass). While exploring options on how to accomplish this, we had the crazy notion of logging everything. I’m not talking about analytic statistics. I’m talking every API call, every request, every single byte of a customer’s game through our systems. Obviously, our recent development sprints have been focused around the ability to efficiently and intelligently handle this load when we start routing production traffic through our new analytic system.
High load is certainly not something new to us. Each time we work on an API, we need to ensure it can withstand the abuse and still provide excellent response times. But at the scale we have planned, we found ourselves in a whole new ballpark. Handling 8 billion requests per month was one thing. Logging detailed data about each request was another.
On the architecture side of this project, we’ve decided to explore a NoSQL implementation for storing all of this analytic data, but I’ll leave that topic for another blog post. Populating the stage environment with “production-like” traffic, however, became a challenge; there isn’t a quick and easy way to generate ~3000 requests per second without purchasing some really expensive tool (like BrowserMob) or spending a sprint doing some crazy jury-rigging of our own tool-set.
Github is your friend…
Trolling through Github a few days ago, we came across an interesting project that caught his eye. The project, incubated by the News Applications Team at the Chicago Tribune and aptly named “Bees with machine guns”, seemed to be exactly what we were looking for. We did need to make some minor tweaks (see below), which you can check out our fork here.
The tag line is hilarious; “A utility for arming (creating) many bees (micro EC2 in stances) to attack (load test) targets (web applications)”. Built on Python with the use of Boto, using it was fairly straight forward. Setup your AWS credentials, spin up a handful of bees, get some beer (while the ec2 instances boot) and point the bees at the URL you’re going to attack load test. Sit back and watch the ensuing, self-inflicted DDOS attack load test.
Since “Bees with machine guns” effectively is running Apache Benchmark from numerous EC2 instances concurrently, it’s all built around a pre-built AMI running a base Ubuntu environment with ‘apache2-utils’ installed (giving you ab). For our fork, we’ve implemented the ability to make POST requests (the original only did GETs) with a payload file. We’ve also rolled our own AMI, in case we want to replace Apache Benchmark with something more session-based like SoapUI.
Picking your battles
Taking a look at what’s currently out in the Open Source ecosystem ultimately saved our butts. We were able to compress the time it would have taken us to flesh out a fully functional distributed load test environment by simply finding something already done and expanding on it. Plus, it’s totally awesome to sit there with a pint of beer as you watch bees with machine guns swarm your services in a self-inflicted DDOS.