Online Matters Transcending the single Player Experience.

Category: Engineering

Blog posts from the Engineering team at GameSpy Technology


Growing Pains: They Hurt So Good


First, housekeeping: there will be free beer at the end of this discussion. Now that I have your attention, let’s get started.

For those of you following our exploits, our Principal Systems Engineer Resident Cthulhu Cultist Ryan Creasey mentioned in a previous post – “Arming a Bee with a Machine Gun – that we’ve been actively working on logging all of the data that runs through our systems; and we track a lot of data (8 billion requests per month). The goal here: provide real-time analytics to our customers. Simple, right?

Let me step back and give some context – since the launch of GameSpy Open we’ve been on a path to continually improve developer experience with our products and services. We’ve been actively working on improving platform integration, simplifying the process for integrating our SDKs, and of course, bringing more awesome to our Developer Dashboard which is the impetus for this blog post.

The beauty of data visualization (xkcd.com)

Now, back to the data. The long and short is that we outgrew the analytics system that we initially chose for handling all the data we collect for the games that use our services – and we outgrew it MUCH faster than we’d ever anticipated. Shortly after the launch of Open, it became abundantly clear that our analytics data would be too big for its britches.

With this goal in mind, we set out to create a superior analytics system that could keep up with the heavy load over the long haul.

We Need More Powah…

Our first pass at building this new analytics system leveraged an existing webservice infrastructure written in C# using .NET 4.0/WCF. Hardware: two Windows 2K3 quad-core boxes with 2GB RAM to get things started. Our initial concern was the data bottleneck, the issue behind the aforementioned growing pains. Of course, NoSQL has already addressed this problem – TL;DR (as this will be discussed in a future post), we settled on MongoDB for our data store.

We then wrangled up some really pissed-off bees, which proceeded to annihilate our systems, Max-Payne-style. What we found is that our data store was no longer the bottleneck – Mongo was kicking ass and taking names.

Instead, we began hitting limits on our web tier. Our initial tests never even matched the identified goal of handling 3K reqs/sec (roughly what we see from production load across all of our current titles) – the two boxes we spun up for the PoC hit their cap at ~2,400 reqs/sec. Worse, these requests took 3-5 seconds to complete, on average; insufficient, to say the least, and unacceptable for a real-time system.

Next Song!

We knew that we needed something different; better, faster, stronger – if you will. Our first thought was to evaluate something from the Java world – which would be an easy paradigm-shift from C# and could leverage frameworks such as Play/Netty.

Diving a bit further, we stumbled upon an interesting article around Goliath/EventMachine that piqued our interest. This framework leverage Ruby 1.9 Fibers to schedule cooperative lightweight, thread-like structures that can pause and resume on a whim to create a highly efficient web server framework for handling concurrent load.

I find your lack of faith (in other languages) disturbing

You might ask yourself: Ruby… really? Believe me, we were skeptical as well. On the one hand, Java is tried and true and has reliable (albeit bloated) libraries to choose from. Ruby, on the other glove hand, provided the agility we needed to get a smaller, modular piece of our infrastructure done quickly. And, with the increasing number of Ruby projects available on GitHub, it also provides a great deal of extensibility. With this, we unleashed Goliath.

Drum Roll Please…

Now on the Linux front, we used two equivalent quad-core, 2GB RAM XenServer VMs running CentOS5 to keep our tests as even as possible. Since Goliath creates individual reactors to handle requests – we put HAProxy in front to balance the load. Pro Tip: ten reactors per machine gave us optimal results, but experiment to see what works best for you.

Speaking of which, those results were astounding (especially for some of our harder-core Ruby skeptics). Our load tests showed that our new Goliath endpoint was handling 4,500 reqs/sec with a 200ms avg. response time. Holy crap! This knocked the socks off our previous go ’round by an order of magnitude – even running on equivalent hardware. Now granted… 200 ms is still not ideal, but it’s a huge step in the right direction.

Papa Loves Mongo

But we weren’t out of the woods yet. If you haven’t played with MongoDB yet, forewarning: Mongo is not like other NoSQL offerings. If configured improperly, it will bite you. Hard. We’ve had our fair share of S&M, but this was not the time nor the place, as we were stone cold sober and had a deadline to meet. After poring over the Mongo wiki, Google group, and some online validation we came up with an approach that worked for us and could scale horizontally while maintaining a redundant backup.

We settled on a configuration of six nodes for the main cluster: three replica sets with two nodes apiece, each other node of the set is part of a sharted (sharded) cluster; e.g. (1-2-3) | (4-5-6). A final seventh node acts as the hidden backup of each replica set, runs three mongod instances and a configsvr to backup all data in case of zombie apocalypse (only a matter of time). The important part to note is that this third instance of the replica set means there is an odd number of nodes in the set, so an arbiter is unnecessary. Furthermore, the configsvr can live on this backup node to protect any failures from the primary cluster serving traffic. Check out our MongoDB Chef Cookbook if you want to replicate this setup.

You Talk Too Much

Thanks! The results of our efforts are now an updated Developer Dashboard (currently in the last stages of QA), with working metrics that are “super speedy”! Yes, it’s a technical term. We also have a lightweight, modular, highly-optimized and horizontally-scalable analytics webservice and a distributed load-test harness to boot. Not too shabby. We’re not done of course! There’s still more to come. Join us next time for a deeper dive into the infrastructure behind this system and our plans for the future. Plus, we’ll have more beer.


Comments Off

Categories: Engineering

Giving a Bee a Machine Gun


Deep Dive Into Data

One of the current API projects we’re working on involves creating a deeper visibility into the data of our customer’s use of our SDK services (because data visualization can kick some serious ass).  While exploring options on how to accomplish this, we had the crazy notion of logging everything.  I’m not talking about analytic statistics.  I’m talking every API call, every request, every single byte of a customer’s game through our systems. Obviously, our recent development sprints have been focused around the ability to efficiently and intelligently handle this load when we start routing production traffic through our new analytic system.

High load is certainly not something new to us. Each time we work on an API, we need to ensure it can withstand the abuse and still provide excellent response times. But at the scale we have planned, we found ourselves in a whole new ballpark. Handling 8 billion requests per month was one thing. Logging detailed data about each request was another.

On the architecture side of this project, we’ve decided to explore a NoSQL implementation for storing all of this analytic data, but I’ll leave that topic for another blog post.  Populating the stage environment with “production-like” traffic, however, became a challenge; there isn’t a quick and easy way to generate ~3000 requests per second without purchasing some really expensive tool (like BrowserMob) or spending a sprint doing some crazy jury-rigging of our own tool-set.

Github is your friend…

Trolling through Github a few days ago, we came across an interesting project that caught his eye. The project, incubated by the News Applications Team at the Chicago Tribune and aptly named “Bees with machine guns”, seemed to be exactly what we were looking for. We did need to make some minor tweaks (see below), which you can check out our fork here.

The tag line is hilarious; “A utility for arming (creating) many bees (micro EC2 in stances) to attack (load test) targets (web applications)”. Built on Python with the use of Boto, using it was fairly straight forward.  Setup your AWS credentials, spin up a handful of bees, get some beer (while the ec2 instances boot) and point the bees at the URL you’re going to attack load test.  Sit back and watch the ensuing, self-inflicted DDOS attack load test.

Since “Bees with machine guns” effectively is running Apache Benchmark from numerous EC2 instances concurrently, it’s all built around a pre-built AMI running a base Ubuntu environment with ‘apache2-utils’ installed (giving you ab).  For our fork, we’ve implemented the ability to make POST requests (the original only did GETs) with a payload file.  We’ve also rolled our own AMI, in case we want to replace Apache Benchmark with something more session-based like SoapUI.

Picking your battles

Taking a look at what’s currently out in the Open Source ecosystem ultimately saved our butts.  We were able to compress the time it would have taken us to flesh out a fully functional distributed load test environment by simply finding something already done and expanding on it. Plus, it’s totally awesome to sit there with a pint of beer as you watch bees with machine guns swarm your services in a self-inflicted DDOS.



Making Technology Work for Millions of Gamers


We here at GameSpy have been hunkered down bringing connected gaming to the masses for over 10 years now. We’ve dealt with many technical challenges along the way, some of which were successful out the gates while others required us to rethink our design, architecture, and infrastructure.

By now you’re probably thinking to yourself, “Who the heck are you?” So, let me introduce myself. My name is Mike Ruangutai and I’m the lead code geek here at GameSpy Technologies. More specifically, I manage the engineering team responsible for the connected gaming services powering the thousand or so titles to date. Like many of you, technology fascinates me and I’ve dabbled in many flavors of it throughout the years. Moving from C++ to the early incarnations of Java and J2EE to .NET and now to Ruby, NoSQL and all manner of open source technologies.

So, back to GameSpy. Let’s start with where we are today …

This may sound heretical to some of you, but we’ve primarily been – and still are – a Microsoft shop running the various flavors of .NET, SQL Server, and Windows Server. That isn’t to say we haven’t been successful with the platform, we have. They’ve got great products that address the technology needs for thousands of organizations.

But …

Looking at the technology landscape today, we find that our needs align closer to the likes of Netflix, Twitter, and other big data/big traffic shops. We service nearly 9 billion API requests per month and we’ve got gobs of data coming into our infrastructure daily. And like those guys, we continually hit the scalability and availability wall where Microsoft has failed to innovate effectively. So what is a technology shop to do? As you may have guessed, we find ourselves embracing “the always innovative” open source community.

Great. So what does all this mean?

Well, we’re going to start using this blog to retell the lessons that we’ve learned on the Microsoft stack, as well as document our evolution to the new world for all to see. In the hopper we’ve got Ruby on Rails, Sinatra, MongoDB, Hadoop, Chef, Netty, Play Framework … the list goes on and on. We’ve put some of these technologies into production and we’re happy to impart any knowledge and insight gained to our engineering compatriots.

Stay tuned! The party has just started.



Categories: Engineering