The Beauty of Seasonality in Data

In terms of dealing with data, many companies struggle with seasonal fluctuations.
seasonality of data
Some of the most common seasonal fluctuations I’ve seen are due to:

  • Holidays – like Christmas, Thanksgiving, Labor Day, etc.
  • Hallmark Holidays – like Valentines day, Mother’s day, etc.
  • Traditions – like New Year’s resolutions, etc.
  • School-related events – like spring & summer breaks, football season, homecoming, proms, school year starts, graduations, reunions, etc.
  • Actual Seasonal changes – like the coldness of winter, summer heat, rainy season, etc.

While some people see seasonality as the bane of their existence, I see it another way. For me…

SEASONALITY = OPPORTUNITY

The way I see it, the more complex something gets, the bigger the opportunity for those who are able to deal with it because the barrier to entry is high and the number of savvy companies that can figure out how to properly deal with it properly are low.

Why Seasonality is Difficult
Two primary reasons: 1) Seasonality mucks up data and 2) obtaining 3rd party seasonal data is not easy.

In the first case, companies that don’t learn to deal with it correctly will make poor decisions. Not realizing that Easter falls in March next year could cause you to mimic activity from this year 2-3 weeks late, but not realizing that Easter was the cause of your increased sales this year is an even bigger missed opportunity. Always pay attention to peaks and valleys for your site or sections of your site to better understand the human behavior that cause such a fluctuation. If you can figure out why, then you have a competitive advantage going into the next time that event may occur.

In the second case, sure there is Google Trends if you know in advance what terms to look at (which is not usually the case), but if you were to jump into Google Adwords or any keyword tool wanting to know what will be a hit two or three months into the future, you can’t rely on average data or what happened last month. I’ve found great value in services like Hitwise (though very expensive) where I can look at keyword trending over time and see what keywords drove traffic to my competitors 13 months ago.

How to Deal With Seasonality
To best tackle seasonality, mine your data and mine external industry data. Look for monthly, weekly and even daily fluctuations. Keep tabs on w/w, m/m and y/y growth rates. Also, keep a calendar of events that may cause fluctuations in data (site redesign), so you don’t mistakenly attribute that fluctuation to something else. And when you can identify a source for the seasonality, make an action plan for next year.

Understanding your seasonality is the first part. Acting upon the intelligence is the second. For example, if a weight loss company discovers that summer high school reunions cause a burst of new customers in June, target your late May to June advertising to reunion planning sites like facebook, classmates, yahoo groups, reunions.com or bidding on long-tail local reunion terms like “ehs 2007 reunion” (note: not a single advertiser has figured this one out yet) or “roosevelt high school 1994 class reunion”. Even consider creating a special plan targeted to those customers (Rapid Reunion Weight-Loss Program).

To be fair, many companies, especially retail, have seasonality built into their veins, but even these types of companies could easily improve if they understood what exactly is driving people’s interest and the exact timing of it.

If you are in a business affected by seasonality, be happy that your data has a pulse:
seaonal data chart

instead of a a flat line like this (call life support, we’ve got a flatliner):
non-seasonal data chart
For those fascinated by seasonality, I recommend reading Click by Bill Tancer.

For more false-seasonality fun, read: Days in the Month Bias for Analytics

Web Analytics Provider Data Tested

I love it when someone takes the time to research and answer a question that many people have. In this case, Stone Temple decided to put several web analytics providers to the test by installing multiple solutions on a few sites to see the differences. Their results can be seen here. Be sure to read the whole report because some of the differences were due to implementation mistakes.

Here are what I found to be the key findings:

  • Be prepared for different numbers whenever switching analytics packages. None seem to count the data in the exact same way.
  • 3rd-party cookie deletion rate exceeds 1st-party cookie deletion by about 13%. More proof that you shouldn’t use 3rd party cookies.
  • WebTrends, ClickTracks and Google Analytics may over count uniques and WebSideStory (HBX) and Unica may undercount.
  • ClickTracks may severely undercount page grouping data.

Potential flaws with the study:

  • Just four sites used. Pre-screened by sites that had large enough paid search spending.
  • Of the four sites, none are high traffic sites. I could only find two of them in ComScore and the site with the most traffic only sees about 200k U.S. visitors a month. I’d love to see the same study on sites with more visitors which would make the datae much more reliable. reliability of the data.
  • In an effort to “protect” the participating sites from sharing their real traffic volume data, daily uniques time period was not disclosed, plus each analytics package probably has different rules on what constitutes a daily unique. For example, some may cut off a “visit” at midnight, but let it carry to the next day as another unique “visit,” others may not. Another example is that some may choose to expire visits at different time periods (30-minutes of non-activity, etc.). I would have liked to see a weekly or monthly uniques count comparison instead.
  • When I first heard of this study I was excited that we may finally learn a lot about the different providers and which are the best solution, but was a bit disappointed when the results were released. Sounds like we may learn more when the final results are released, but it may be more along the lines of implementation findings. I hope it inspires more people to do more tests.

Please Stop Quoting Alexa Data

Far too often I hear people quoting Alexa data. Even last week, at the 2007 Omniture Summit I witnessed Tim O’Reilly using Alexa charts to prove Web 2.0 success in front of 1,000 smart web analytics professionals. I know I couldn’t have been the only person in the crowd to notice. For Tim’s benefit, and anyone else who uses Alexa Data, please take note:

ALEXA DATA IS TREMENDOUSLY FLAWED

I touched on this in a competitive intelligence metrics post back in October, showing that Alexa’s data is less accurate to determining true site traffic then the # of characters the domain name, but now I’d like to really illustrate how far off Alexa’s data is.

Many people have pointed out Alexa’s data is biased towards a certain crowd and can be manipulated (see the links at the bottom of this post), but none have illustrated the margin of error that I’m about to. Below I take a look at two very different sites with very different traffic stats.

Site 1: Allrecipes – Allrecipes is a leading food site – as you might expect, Allrecipes users are similar to what you might see on the Internet as a whole, though slightly more female.

Site 2: SEOMoz – SEOMoz is a site that caters to the SEO and online marketing community – a crowd more likely to install the Alexa toolbar.

Using Alexa, you might conclude that SEOMoz receives more traffic than Allrecipes:

Alexa Reach Chart:
Alexa Reach

Alexa Rank Chart:
Alexa rank

Both sites are very popular within their target audience, but despite what Alexa may show, Allrecipes has much more traffic. Let’s face it, more people cook food, then perform SEO! In fact, if you were to populate the above charts with actual data, SEOMoz would be a flat sliver near the x-axis. Here’s some real data from Dec. ’06:

Allrecipes Unique Visitors: 11,023,187
SEOMoz Unique Visitors: 102,523

If you were to use Alexa charts to draw conclusions about either site based off real numbers for one site, your traffic estimates would be off by approximately 11,842%. Numbers that big are often difficult to grasp, so I like to put it in perspective. A mistake of that magnitude is the equivalent of:

  • The CIA mixing up the population of Ohio for China.
  • Your accountant saying you owe $1,000 to the IRS, when you really owe $119,417.
  • A cop pulling you over for doing 60 in a 30, when you were really going half-a-mile-per-hour.
  • Telling your spouse you’ll be home in three hours, then showing up 15 days later.

These are mistakes that none of us could get away with, so why should we let Alexa?

I’m not the first to prove Alexa data is flawed. Here are links to other Alexa skeptics:
Peter Norvig, Paul Stamatiou, Josh Pigford, Matt Cutts, Rand Fishkin (thanks for the data!), Greg Linden, Bruce Stewart, Alex Iskold, John Chow, and Markus Frind.

Digg my article


How Users Print Pages On The Web

I remember about a year ago I was desperately searching for data on how people printed pages on the web. The reason I was curious, is because I noticed flash ads would often mess up pages printed straight from the browser, often not printing the content of the pages. This is a bad user experience which could cause visitors to start using a competitor’s site instead.

Unfortunately, I was unable to locate any studies. I thought with the millions of sites that have “printer friendly pages” that someone would have published the results. I decided to do the research myself and slip it into a survey during some pre-redesign research for a top 150 website. I surveyed over 2,000 users, asking them how they printed pages on the web. The results may surprise you.

Here are the results:
When printing articles or pages on the Web:

  • 19% of users use File > Print in their browser
  • 63.1% of users use the printer-friendly links on the page
  • 2.5% of users use the Control-P command on their keyboard
  • 12.3% of users copy and paste the text into Word
  • 3.1% of users copy and paste the text into an email or other application

A couple notes about the survey participants. The site this was conducted on would be considered a sampling of the average Internet user. A site catering to web-savvy users would have different results. The site has also long had “printer-friendly” links, so long-time users would be more likely to use them. To remove some of the long-term user bias, here are the same results but filtered by only users who have used the site for less than 3 months (over 375 users).

Here are the results for newer users:
When printing articles or pages on the Web:

  • 25.3% of users use File > Print in their browser
  • 49% of users use the printer-friendly links on the page
  • 3.1% of users use the Control-P command on their keyboard
  • 17.5% of users copy and paste the text into Word
  • 5% of users copy and paste the text into an email or other application

I realize a survey isn’t the most accurate method to get at this data, but this data is difficult to collect any other way because it is impossible to track anything other then the printer-friendly pages of a site without conducting an expensive in-person behavioral study (preferably on the users own computer).

If you know of any other research on this topic, please share it in the comments.

Must-read Interview of Marissa Mayer

Over at SearchEngineLand, Gord Hotchkiss launches his new column “Just Behave” with what I would characterize as one of the most informative Google interviews I’ve ever read. It’s a shame it was posted on a Friday. Check out the Marissa Mayer interview.

The interview confirms a number of theories I’ve had, especially about their one-box results and how and when they decide to show sponsored results in the left column. We already knew their ranking algorithms were far superior to MSN and Yahoo, but they’ve even got well-designed algorithms to decide where to place ads on a given search, whether to show news and if they should give a site in the organic results the magic one-box (the one that shows multiple links for the top site).

Google’s attention to user-experience is the main reason Yahoo and MSN haven’t caught up. MSN continues to focus on selling ads rather than improving the search experience. MSN is banking on IE 7 and adding MSN search to more MSN properties as their primary methods for increasing market share, but it won’t work because the search experience is so poor.

Beefing up the MSN AdCenter abilities won’t help either. Advertisers are already impressed with AdCenter, but they want more traffic. MSN’s focus is terribly off and I won’t be surprised if MSN’s market share in the search industry contines to slide. Here’s a key indicator: right now MSN is hiring 49 engineers for AdCenter, but only 9 for Live Search!