I’ve always been surprised by how much web analytics software sucks. Web sites produce reams of interesting statistical data, many sites have a strong interest in growing their traffic, and yet even the leaders in the field—Google Analytics and co.—do nothing to help: they make no attempt to analyze the data; they just display it in fairly pedestrian ways.
Every day, you log into your analytics dashboard, and you see basically the same picture that you saw the day before. It’s left up to you to find out what’s changed. Did your search referrals for some mid-ranking keyword rise hugely? Are you getting referrals from some site that never linked to you before?
Additionally, most packages don’t provide any mechanism for you to integrate your business-specific data with your web site data. Sure, you can sometimes hack this with things like tracking visits to special “checkout” pages, but fundamentally there’s no direct way of integrating off-site data (sales of an iPhone app, say).
I wrote some Emacs-based software to do all of this last year (including charts in the REPL), and have been using it fairly happily. Recently, I’ve started using Mathematica more and more to process log data, and have been very happy with the results.
A few days ago, Google released an API for Google Analytics (largely invisible but enormously successful). I spent a while over the weekend writing a bridge to Mathematica, and playing with the data, particularly that related to my iPhone app (Encyclopedia, which stores a copy of Wikipedia on your phone for offline browsing).

We can start off by doing fairly standard things, like number of visits from each search term by date:

And we can, say, chart the result. (First, we remove the “(not set)” results that the API returns.)

What fraction of this is from search traffic related to the iPhone app? We can select the promising search terms, and compare them with overall search traffic (as far as I know, no boolean logic is possible in the Google Analytics web interface):

Looks like most of the Google traffic is from the app-related keywords. We can also pretty trivially look at the geographical distribution of these visitors. First, let’s load the data:

And group the result by country:

We can now define a simple WorldPlot function, and use it with our data:

(Brighter colours indicate more visits.) Okay, that’s somewhat interesting. But how many sales does the app get per visit? I track my sales in Dabble DB, so I’ll load the data from there. They provide a CSV file of country code to unit sales mappings:

We can pretty easily import this into Mathematica:

Using Mathematica’s CountryData[] function, which knows about country codes (as well as their shapes), we can easily integrate the keyword referrals and the sales data to generate a heat map of purchases per search referral:

The interesting outlier here is Mexico. Though they don’t visit the site much, they do buy the app a lot. (Hundreds of times to date.) I’m still not sure why. The two maps also show that, although Ireland sends quite a bit of traffic to the site, it converts quite poorly—perhaps because my .ie domain causes my search rank there to be artificially inflated.
Okay, ignoring the app for a while, we might also be interested in questions like “what are my most important search terms?”. One way of measuring this is to look at the total time on your site due by each search term. (Google Analytics tells you the average time on site for each term, but that’s not much help in telling you where you should direct your work.)
Let’s load the data:

And define a simple function that computes the total number of seconds on the site driven by each keyword:

We can compute the set of search terms that were used to get to the site:

And sort them by time on site:

Ignoring the first two (corresponding to “(not set)”, which we could of course filter out), it seems that “wiki”, “wikipedia” and “iphone” are the important terms.
We might also be interested to see how keyword usage changes over time. Let’s load two months of keyword data:

And define a simple function to count the fraction of referrals that are from a particular keyword in some given dataset:

We can test it with something like:

Okay, 4% of the site’s visitors came by searching for something containing the word “iPhone”.
Now let’s compare two months of data:

These are the search keywords that showed the biggest increase between October and January. And they correspond quite closely to what you’d expect—default.png files on the iPhone and Back To My Mac are both topics I wrote about in the intervening time period.
Keeping with the spirit of analysing changes, we can look at non-search referrals:

We can drop the direct referrals, group them by source, and select the first referral from each site:

And then we can do something like plot the first referral from each site against the amount of traffic it sent—a chart of buzz, basically:

Next, we can look at how traffic flows around the site. First, we load (landing page, exit page) tuples:

And then hack together a function to generate a directed graph:

Yielding a pretty interesting result:

As a final example, we can, for no good reason, take advantage of the latitude and longitude metrics that the API provides and quickly create a video of visits by hour, showing traffic move across the globe. First, we load the data, and define two helper functions:

We can test it out on one of the hours:

Looks good. Let’s export an image for each hour, which we can then join with QuickTime:

The result is here. (You probably want to have it loop when you play it.)
Mathematica doesn’t get much attention from the programming community (largely because of Wolfram’s pricing, as far as I can tell). But its power is undeniable—I spent about 5 hours writing the Google Analytics interface and generating the above data. I wrote this blog post over lunch. If anyone else is interested in using the Mathematica/Google Analytics interface, let me know in the comments, and I’ll package it up and release it somewhere (it requires modified versions of a few libraries).
Lastly, over the past year, I spent a lot of time talking to my friend Avi about the state of web analytics. He and the guys at Dabble DB decided to do something about it, and it looks like dshbrd will launch soon. From what I’ve seen so far, it looks like it’ll be win.