Tools in My Designer Tool Belt: A/B Testing (part 1)

Taking questions at the end of my presentation. Thanks to my friend Jonathan Arena for the pic. 

Taking questions at the end of my presentation. Thanks to my friend Jonathan Arena for the pic. 

Earlier this month I had the opportunity to present my experience learning how to make A/B testing a part of my "designer toolbelt" at SXSW in Austin, TX. It was my first time presenting at SXSW, and I have to tell you it was a bit intimidating. It was a room meant to hold about 500 people, but in that room was about a dozen Netflix colleagues including my boss and another dozen or so friends & acquaintances. They were all there to support me, and I didn't want to disappoint any of them. Strangers? Pfff. ;-) 

It took a significant amount of time over the preceding few weeks to pull it together. My ratio for doing a presentation like this is about 1:40-50, meaning that a one hour presentation requires between 40-50 hours of preparation, from crafting the story, to creating the presentation, to practicing. First thing was to figure out a story worth listening to amongst the cacophony of talks happening at SXSW. Not easy. This involved lots of thinking, research, and Post-its. 

This was my first complete outline using my Post-it method. This was taken as a panoramic shot of my desk so it looks somewhat warped, but you get the idea.  

This was my first complete outline using my Post-it method. This was taken as a panoramic shot of my desk so it looks somewhat warped, but you get the idea.  

Once I felt like I had a basic outline, it was followed by research (both from my past and the world out there), gathering content, pulling it together into a draft presentation, and then doing a dry run with people from work. One of the strange things about me doing presentations is that I can't "perform" at a dry run, but it was necessary to get some good initial feedback. The feedback resulted in significant changes (thanks, guys!), running through it with my wife at home, working on it some more, doing another dry run with a few more folks (same problem), tweaking some more, practicing in my hotel room, and finally showing up in the ready room an hour before my scheduled time. 

I ended up pretty happy about how it all came together. I'll share parts of that story and presentation over a couple blog posts. 

•••••

ABTesting_Toolbelt_new.021.jpeg

As an art & design college graduate, I spent the first dozen years of my career capitalizing and building on my "tools for creating" while learning on the job a set of complementary "tools for understanding". These tools ranged from usability testing, ethnographic research, and playtesting to focus groups and interviews. I learned by working with experts in their field and studying their methods. 

These methodologies gave me various ways to better understand what people say and what they do. With these tools, I was able to design enterprise products; PC, ARG, and console games; entertainment platforms like the Xbox 360; a startup's web & mobile MVP; and some future-looking new product innovation projects. I thought I was putting these tools to good use. 

Then around 2008 a couple things shook me from my self-satisfaction. 

•••••

Google couldn't decide which shade of blue to use in ad links, so they tested 41 different shades of blue to see which one was most effective. Google's lead visual designer at the time, Doug Bowman, publicly rage quit over this and other similar things on his blog

I can’t operate in an environment like that. I’ve grown tired of debating such minuscule design decisions. There are more exciting design problems in this world to tackle.
— Doug Bowman, former Google Visual Design Lead

(╯°□°)╯︵ ┻━┻

However, this change has reportedly contributed an extra $200M a year in revenue since. Holy crap! 

The historic Obama '08 presidential run also made news for its online campaign. They ran hundreds of experiments to optimize the campaign's homepage with stunning results

The winning variation had a sign-up rate of 11.6%. The original page had a sign-up rate of 8.26%. That’s an improvement of 40.6% in sign-up rate. What does an improvement of 40.6% translate into?

Well, if you assume this improvement stayed roughly consistent through the rest of the campaign, then we can look at the total numbers at the end of the campaign and determine the difference this one experiment had. Roughly 10 million people signed up on the splash page during the campaign. If we hadn’t run this experiment and just stuck with the original page that number would be closer to 7,120,000 signups. That’s a difference of 2,880,000 email addresses.

Sending email to people who signed up on our splash page and asking them to volunteer typically converted 10% of them into volunteers. That means an additional 2,880,000 email addresses translated into 288,000 more volunteers.

Each email address that was submitted through our splash page ended up donating an average of $21 during the length of the campaign. The additional 2,880,000 email addresses on our email list translated into an additional $60 million in donations.
— Dan Siroker, former Director of Analytics, Obama '08 Campaign

A part of me delighted in Doug's words and actions; there's something admirable about someone sticking to their convictions in such a manner, but these two events made me intensely curious about the methodology used in both of them – something called A/B testing

Umm, what? 

(to be continued...)