Archive for the ‘Analytics’ Category

Days in the Month Bias for Web Analytics

March 2nd, 2009

One variable often overlooked that causes fluctuation in Month over Month analysis in Web Analytics data (and I suppose other sets of data) is days in the month. February is a prime example where you go from 31 days in January to only 28 in February (except leap years) resulting in an apparent 9.7% loss in traffic.

Below is a chart of assuming steady traffic, meaning the exact same amount of traffic every day for the entire year. See how wildly it swings purely due to the number of days in the month? This is important to keep in mind if you are using M/M analytics.

Days in Month Bias for Web Analytics

Use the below numbers as a reference to better understand the month-by-month day count analytics bias to help you better explain your monthly reports:

  • January: 0.00%
  • February: 9.68% loss (6.45% loss during leap year)
  • March: 10.71% gain (6.90% gain during leap year)
  • April: 3.23% loss
  • May: 3.33% gain
  • June: 3.23% loss
  • July: 3.33% gain
  • August: 0.00%
  • September: 3.23% loss
  • October: 3.33% gain
  • November: 3.23% loss
  • December: 3.33% gain

One other variable that may be overlooked is the line-up of days in the month. For some this may be the number of weekends in the month, for others it may be the number of Mondays.

Posted in Analytics | Comments (8)

Long Tail of Search

November 6th, 2008

Finally some solid evidence showing how long the long tail of search really is! Having worked for some big high-traffic sites, I was always discouraged with the underestimation of the true length of the long tail in other public reports. Finally I did my own research and it was published on the Hitwise blog:

Sizing Up the Long Tail of Search

Long Tail of Search Research

Here’s a sneak peak:

“After great dissatisfaction with the existing research, which I felt vastly understated the true size of the long tail, I decided to do my own research…There’s so much traffic in the tail it is hard to even comprehend. To illustrate, if search were represented by a tiny lizard with a one-inch head, the tail of that lizard would stretch for 221 miles.”

Understanding the long tail and how to target it from an SEO standpoint is no simple task. I hope this article sheds some light on how important long tail traffic is.

In my experience, I’ve ranked for head terms and I’ve ranked for millions of tail terms. I’d gladly trade in the head terms for a larger piece of the tail. A few companies have learned this, including the search engines, but they’d prefer you don’t know how much of a gold mine it really is.

Posted in Analytics, Search | Comments (2)

Possible Reduction In Spam

October 15th, 2008

The NY Times reports that the largest spam gang on the Internet is being shut down, starting with their assets being frozen. Some key numbers shared in the article:

  • This group makes $400,000 a month
  • They send 10 billion spam emails per day
  • This group, at one point, sent out 1/3rd of all spam
  • 90% of all email people receive is spam

If these numbers are true, then:

  • The group made 1/7500th of a cent for each email sent (the only cost-effective way had to be sent from unknowingly affected computers)
  • Email users should expect a 33% drop in spam, and an 44% drop in overall incoming email volume.

I wish it were true, but I’m skeptical that we won’t see such drops. Unless the penalties are extremely harsh, other spammers will step in to get a piece of the newly available spam pie.

Posted in Analytics, Web | Comments (0)

Omniture Buying Visual Sciences for $394 Million

October 26th, 2007

Say it ain’t so. Omniture (Nasdaq: OMTR) is buying Visual Sciences (Nasdaq: VSCN) for a reported $394 million. The combination of the two best-of-breed analytic providers can’t be a good thing for companies using web analytics solution providers.

Omniture states that they will rapidly and grow their technologies, but I don’t see a near monopoly being a good thing for most companies. Competition is good. I’ve been a customer for both companies and felt they really were by far the two best offerings in hosted analytic solutions.

It will be interesting to see if regulators allow the purchase because it feels like it will be too close to a monopoly to me. I assume Visual Science stockholders will approve the purchase as the company has been riddled with problems, losing many key employees to Omniture. The acquisition is expected to close in mid-2008.

Congrats to Omniture. This news shows just how strong Omniture has become, especially after purchasing Offermatica last month.

Posted in Analytics | Comments (1)

Detailed Google Search Referrer Data

October 2nd, 2007

Found some interesting nuggets when I decided to narrow in on Google referrer data (as reported by Omniture) from one particular high volume keyword.

The word was “lasagna” and when I dug into the Google data, I noticed some interesting things. Google shares the following data in the referrer URL. I compare each search type to the standard “lasagna” search in google (without quotes) to protect the actual traffic volume for the high-ranked website.

Google Searcher Keyword Variations:

Standard lasagna search: 100%
(this is the search I base the rest of the data on)

lasagna misspelled & clicked on google did you mean link: 74%
(this was much higher than I anticipated – people probably ignored the “g” in lasagna).

Lasagna search: 8%
(I guess some people figure capitalizing the first letter will get them better results)

lasagna_ : 4%
(the underscore denotes a space after the search term – I guess some people can’t help but drop their thumbs down on that nice big spacebar)

Google Searcher Behaviors and Platforms:

lasagna search, but clicked search button: 40%
(looks like most people hit enter, but some take the time to click the search button)

standard lasagna search via Firefox: 19%
(firefox users continue to grow and Google likes tracking them)

standard lasagna search via iGoogle: 6%
(looks like some people are using iGoogle as their homepage)

Google Non-U.S. Data

UK standard lasagna search: 9.5%
Google UK misspelled did you mean correction: 34.4%
Google UK lasagna search, but clicked search button: 2.5%

Looks like our friends from the UK need to work on spelling. Misspelled version is 4 times more common than the correct spelling!

Google Canada standard lasagna search: 23.6%
Google Canada misspelled did you mean correction: 15.1%
Google Canada lasagna search, but clicked search button: 8.2%

Our friends from Canada are a little less mousey (hit enter instead) and slightly better spellers than Americans.

We can’t draw too many conclusions from this data, but it does highlight some of the data you can get from looking at your referrer data more closely. I invite you to spot check a couple terms that you rank for and share your findings in the comments.

Posted in Analytics, Search | Comments (0)

The Beauty of Seasonality in Data

July 20th, 2007

In terms of dealing with data, many companies struggle with seasonal fluctuations.
seasonality of data
Some of the most common seasonal fluctuations I’ve seen are due to:

  • Holidays - like Christmas, Thanksgiving, Labor Day, etc.
  • Hallmark Holidays – like Valentines day, Mother’s day, etc.
  • Traditions - like New Year’s resolutions, etc.
  • School-related events – like spring & summer breaks, football season, homecoming, proms, school year starts, graduations, reunions, etc.
  • Actual Seasonal changes – like the coldness of winter, summer heat, rainy season, etc.

While some people see seasonality as the bane of their existence, I see it another way. For me…

SEASONALITY = OPPORTUNITY

The way I see it, the more complex something gets, the bigger the opportunity for those who are able to deal with it because the barrier to entry is high and the number of savvy companies that can figure out how to properly deal with it properly are low.

Why Seasonality is Difficult
Two primary reasons: 1) Seasonality mucks up data and 2) obtaining 3rd party seasonal data is not easy.

In the first case, companies that don’t learn to deal with it correctly will make poor decisions. Not realizing that Easter falls in March next year could cause you to mimic activity from this year 2-3 weeks late, but not realizing that Easter was the cause of your increased sales this year is an even bigger missed opportunity. Always pay attention to peaks and valleys for your site or sections of your site to better understand the human behavior that cause such a fluctuation. If you can figure out why, then you have a competitive advantage going into the next time that event may occur.

In the second case, sure there is Google Trends if you know in advance what terms to look at (which is not usually the case), but if you were to jump into Google Adwords or any keyword tool wanting to know what will be a hit two or three months into the future, you can’t rely on average data or what happened last month. I’ve found great value in services like Hitwise (though very expensive) where I can look at keyword trending over time and see what keywords drove traffic to my competitors 13 months ago.

How to Deal With Seasonality
To best tackle seasonality, mine your data and mine external industry data. Look for monthly, weekly and even daily fluctuations. Keep tabs on w/w, m/m and y/y growth rates. Also, keep a calendar of events that may cause fluctuations in data (site redesign), so you don’t mistakenly attribute that fluctuation to something else. And when you can identify a source for the seasonality, make an action plan for next year.

Understanding your seasonality is the first part. Acting upon the intelligence is the second. For example, if a weight loss company discovers that summer high school reunions cause a burst of new customers in June, target your late May to June advertising to reunion planning sites like facebook, classmates, yahoo groups, reunions.com or bidding on long-tail local reunion terms like “ehs 2007 reunion” (note: not a single advertiser has figured this one out yet) or “roosevelt high school 1994 class reunion”. Even consider creating a special plan targeted to those customers (Rapid Reunion Weight-Loss Program).

To be fair, many companies, especially retail, have seasonality built into their veins, but even these types of companies could easily improve if they understood what exactly is driving people’s interest and the exact timing of it.

If you are in a business affected by seasonality, be happy that your data has a pulse:
seaonal data chart

instead of a a flat line like this (call life support, we’ve got a flatliner):
non-seasonal data chart
For those fascinated by seasonality, I recommend reading Click by Bill Tancer.

For more false-seasonality fun, read: Days in the Month Bias for Analytics

Posted in Analytics, Search | Comments (0)

Web Analytics Provider Data Tested

May 8th, 2007

I love it when someone takes the time to research and answer a question that many people have. In this case, Stone Temple decided to put several web analytics providers to the test by installing multiple solutions on a few sites to see the differences. Their results can be seen here. Be sure to read the whole report because some of the differences were due to implementation mistakes.

Here are what I found to be the key findings:

  • Be prepared for different numbers whenever switching analytics packages. None seem to count the data in the exact same way.
  • 3rd-party cookie deletion rate exceeds 1st-party cookie deletion by about 13%. More proof that you shouldn’t use 3rd party cookies.
  • WebTrends, ClickTracks and Google Analytics may over count uniques and WebSideStory (HBX) and Unica may undercount.
  • ClickTracks may severely undercount page grouping data.

Potential flaws with the study:

  • Just four sites used. Pre-screened by sites that had large enough paid search spending.
  • Of the four sites, none are high traffic sites. I could only find two of them in ComScore and the site with the most traffic only sees about 200k U.S. visitors a month. I’d love to see the same study on sites with more visitors which would make the datae much more reliable. reliability of the data.
  • In an effort to “protect” the participating sites from sharing their real traffic volume data, daily uniques time period was not disclosed, plus each analytics package probably has different rules on what constitutes a daily unique. For example, some may cut off a “visit” at midnight, but let it carry to the next day as another unique “visit,” others may not. Another example is that some may choose to expire visits at different time periods (30-minutes of non-activity, etc.). I would have liked to see a weekly or monthly uniques count comparison instead.
  • When I first heard of this study I was excited that we may finally learn a lot about the different providers and which are the best solution, but was a bit disappointed when the results were released. Sounds like we may learn more when the final results are released, but it may be more along the lines of implementation findings. I hope it inspires more people to do more tests.

Posted in Analytics | Comments (1)

Please Stop Quoting Alexa Data

March 20th, 2007

Far too often I hear people quoting Alexa data. Even last week, at the 2007 Omniture Summit I witnessed Tim O’Reilly using Alexa charts to prove Web 2.0 success in front of 1,000 smart web analytics professionals. I know I couldn’t have been the only person in the crowd to notice. For Tim’s benefit, and anyone else who uses Alexa Data, please take note:

ALEXA DATA IS TREMENDOUSLY FLAWED

I touched on this in a competitive intelligence metrics post back in October, showing that Alexa’s data is less accurate to determining true site traffic then the # of characters the domain name, but now I’d like to really illustrate how far off Alexa’s data is.

Many people have pointed out Alexa’s data is biased towards a certain crowd and can be manipulated (see the links at the bottom of this post), but none have illustrated the margin of error that I’m about to. Below I take a look at two very different sites with very different traffic stats.

Site 1: Allrecipes – Allrecipes is a leading food site – as you might expect, Allrecipes users are similar to what you might see on the Internet as a whole, though slightly more female.

Site 2: SEOMoz – SEOMoz is a site that caters to the SEO and online marketing community – a crowd more likely to install the Alexa toolbar.

Using Alexa, you might conclude that SEOMoz receives more traffic than Allrecipes:

Alexa Reach Chart:
Alexa Reach

Alexa Rank Chart:
Alexa rank

Both sites are very popular within their target audience, but despite what Alexa may show, Allrecipes has much more traffic. Let’s face it, more people cook food, then perform SEO! In fact, if you were to populate the above charts with actual data, SEOMoz would be a flat sliver near the x-axis. Here’s some real data from Dec. ‘06:

Allrecipes Unique Visitors: 11,023,187
SEOMoz Unique Visitors: 102,523

If you were to use Alexa charts to draw conclusions about either site based off real numbers for one site, your traffic estimates would be off by approximately 11,842%. Numbers that big are often difficult to grasp, so I like to put it in perspective. A mistake of that magnitude is the equivalent of:

  • The CIA mixing up the population of Ohio for China.
  • Your accountant saying you owe $1,000 to the IRS, when you really owe $119,417.
  • A cop pulling you over for doing 60 in a 30, when you were really going half-a-mile-per-hour.
  • Telling your spouse you’ll be home in three hours, then showing up 15 days later.

These are mistakes that none of us could get away with, so why should we let Alexa?

I’m not the first to prove Alexa data is flawed. Here are links to other Alexa skeptics:
Peter Norvig, Paul Stamatiou, Josh Pigford, Matt Cutts, Rand Fishkin (thanks for the data!), Greg Linden, Bruce Stewart, Alex Iskold, John Chow, and Markus Frind.

Digg my article


Posted in Analytics, Search, Web | Comments (21)

How Users Print Pages On The Web

February 7th, 2007

I remember about a year ago I was desperately searching for data on how people printed pages on the web. The reason I was curious, is because I noticed flash ads would often mess up pages printed straight from the browser, often not printing the content of the pages. This is a bad user experience which could cause visitors to start using a competitor’s site instead.

Unfortunately, I was unable to locate any studies. I thought with the millions of sites that have “printer friendly pages” that someone would have published the results. I decided to do the research myself and slip it into a survey during some pre-redesign research for a top 150 website. I surveyed over 2,000 users, asking them how they printed pages on the web. The results may surprise you.

Here are the results:
When printing articles or pages on the Web:

  • 19% of users use File > Print in their browser
  • 63.1% of users use the printer-friendly links on the page
  • 2.5% of users use the Control-P command on their keyboard
  • 12.3% of users copy and paste the text into Word
  • 3.1% of users copy and paste the text into an email or other application

A couple notes about the survey participants. The site this was conducted on would be considered a sampling of the average Internet user. A site catering to web-savvy users would have different results. The site has also long had “printer-friendly” links, so long-time users would be more likely to use them. To remove some of the long-term user bias, here are the same results but filtered by only users who have used the site for less than 3 months (over 375 users).

Here are the results for newer users:
When printing articles or pages on the Web:

  • 25.3% of users use File > Print in their browser
  • 49% of users use the printer-friendly links on the page
  • 3.1% of users use the Control-P command on their keyboard
  • 17.5% of users copy and paste the text into Word
  • 5% of users copy and paste the text into an email or other application

I realize a survey isn’t the most accurate method to get at this data, but this data is difficult to collect any other way because it is impossible to track anything other then the printer-friendly pages of a site without conducting an expensive in-person behavioral study (preferably on the users own computer).

If you know of any other research on this topic, please share it in the comments.

Posted in Analytics, General, Web | Comments (0)

Must-read Interview of Marissa Mayer

January 27th, 2007

Over at SearchEngineLand, Gord Hotchkiss launches his new column “Just Behave” with what I would characterize as one of the most informative Google interviews I’ve ever read. It’s a shame it was posted on a Friday. Check out the Marissa Mayer interview.

The interview confirms a number of theories I’ve had, especially about their one-box results and how and when they decide to show sponsored results in the left column. We already knew their ranking algorithms were far superior to MSN and Yahoo, but they’ve even got well-designed algorithms to decide where to place ads on a given search, whether to show news and if they should give a site in the organic results the magic one-box (the one that shows multiple links for the top site).

Google’s attention to user-experience is the main reason Yahoo and MSN haven’t caught up. MSN continues to focus on selling ads rather than improving the search experience. MSN is banking on IE 7 and adding MSN search to more MSN properties as their primary methods for increasing market share, but it won’t work because the search experience is so poor.

Beefing up the MSN AdCenter abilities won’t help either. Advertisers are already impressed with AdCenter, but they want more traffic. MSN’s focus is terribly off and I won’t be surprised if MSN’s market share in the search industry contines to slide. Here’s a key indicator: right now MSN is hiring 49 engineers for AdCenter, but only 9 for Live Search!

Posted in Analytics, Search | Comments (0)