App School

June 12th, 2009

App School just went live. It’s an iPhone development training course taught by my friend Daniel, and run by SQT, my Mum’s training company, and Mulley Communications. The courses will be based in Dublin—it should be a pretty cool chance for anyone Irish to quickly get up to speed with everything related to iPhone app development (including some tips on navigating the notorious app submission process…).

I’ve known Daniel since first year in secondary school—he convinced me to learn PHP, my first real (”real”?) programming language. He’s one of the best programmers I’ve ever worked with, and just won an IBM Open Source Award for his work on Yakumo [PDF], an open-source library for using the iPhone a game input device. Despite only just graduating now, a couple of companies have already spotted his win-ness—he interned with the hardcore physics guys at Havok in San Francisco last summer.

App School is lucky to have him, and I think the courses are going to be killer.

See also: Damien, Web 2 Ireland, Irish Times.

Hacking for fun and profit with Mathematica and the Google Analytics API

April 27th, 2009

I’ve always been surprised by how much web analytics software sucks. Web sites produce reams of interesting statistical data, many sites have a strong interest in growing their traffic, and yet even the leaders in the field—Google Analytics and co.—do nothing to help: they make no attempt to analyze the data; they just display it in fairly pedestrian ways.

Every day, you log into your analytics dashboard, and you see basically the same picture that you saw the day before. It’s left up to you to find out what’s changed. Did your search referrals for some mid-ranking keyword rise hugely? Are you getting referrals from some site that never linked to you before?

Additionally, most packages don’t provide any mechanism for you to integrate your business-specific data with your web site data. Sure, you can sometimes hack this with things like tracking visits to special “checkout” pages, but fundamentally there’s no direct way of integrating off-site data (sales of an iPhone app, say).

I wrote some Emacs-based software to do all of this last year (including charts in the REPL), and have been using it fairly happily. Recently, I’ve started using Mathematica more and more to process log data, and have been very happy with the results.

A few days ago, Google released an API for Google Analytics (largely invisible but enormously successful). I spent a while over the weekend writing a bridge to Mathematica, and playing with the data, particularly that related to my iPhone app (Encyclopedia, which stores a copy of Wikipedia on your phone for offline browsing).

We can start off by doing fairly standard things, like number of visits from each search term by date:

And we can, say, chart the result. (First, we remove the “(not set)” results that the API returns.)

What fraction of this is from search traffic related to the iPhone app? We can select the promising search terms, and compare them with overall search traffic (as far as I know, no boolean logic is possible in the Google Analytics web interface):

Looks like most of the Google traffic is from the app-related keywords. We can also pretty trivially look at the geographical distribution of these visitors. First, let’s load the data:

And group the result by country:

We can now define a simple WorldPlot function, and use it with our data:

(Brighter colours indicate more visits.) Okay, that’s somewhat interesting. But how many sales does the app get per visit? I track my sales in Dabble DB, so I’ll load the data from there. They provide a CSV file of country code to unit sales mappings:

We can pretty easily import this into Mathematica:

Using Mathematica’s CountryData[] function, which knows about country codes (as well as their shapes), we can easily integrate the keyword referrals and the sales data to generate a heat map of purchases per search referral:

The interesting outlier here is Mexico. Though they don’t visit the site much, they do buy the app a lot. (Hundreds of times to date.) I’m still not sure why. The two maps also show that, although Ireland sends quite a bit of traffic to the site, it converts quite poorly—perhaps because my .ie domain causes my search rank there to be artificially inflated.

Okay, ignoring the app for a while, we might also be interested in questions like “what are my most important search terms?”. One way of measuring this is to look at the total time on your site due by each search term. (Google Analytics tells you the average time on site for each term, but that’s not much help in telling you where you should direct your work.)

Let’s load the data:

And define a simple function that computes the total number of seconds on the site driven by each keyword:

We can compute the set of search terms that were used to get to the site:

And sort them by time on site:

Ignoring the first two (corresponding to “(not set)”, which we could of course filter out), it seems that “wiki”, “wikipedia” and “iphone” are the important terms.

We might also be interested to see how keyword usage changes over time. Let’s load two months of keyword data:

And define a simple function to count the fraction of referrals that are from a particular keyword in some given dataset:

We can test it with something like:

Okay, 4% of the site’s visitors came by searching for something containing the word “iPhone”.

Now let’s compare two months of data:

These are the search keywords that showed the biggest increase between October and January. And they correspond quite closely to what you’d expect—default.png files on the iPhone and Back To My Mac are both topics I wrote about in the intervening time period.

Keeping with the spirit of analysing changes, we can look at non-search referrals:

We can drop the direct referrals, group them by source, and select the first referral from each site:

And then we can do something like plot the first referral from each site against the amount of traffic it sent—a chart of buzz, basically:

Next, we can look at how traffic flows around the site. First, we load (landing page, exit page) tuples:

And then hack together a function to generate a directed graph:

Yielding a pretty interesting result:

As a final example, we can, for no good reason, take advantage of the latitude and longitude metrics that the API provides and quickly create a video of visits by hour, showing traffic move across the globe. First, we load the data, and define two helper functions:

We can test it out on one of the hours:

Looks good. Let’s export an image for each hour, which we can then join with QuickTime:

The result is here. (You probably want to have it loop when you play it.)

Mathematica doesn’t get much attention from the programming community (largely because of Wolfram’s pricing, as far as I can tell). But its power is undeniable—I spent about 5 hours writing the Google Analytics interface and generating the above data. I wrote this blog post over lunch. If anyone else is interested in using the Mathematica/Google Analytics interface, let me know in the comments, and I’ll package it up and release it somewhere (it requires modified versions of a few libraries).

Lastly, over the past year, I spent a lot of time talking to my friend Avi about the state of web analytics. He and the guys at Dabble DB decided to do something about it, and it looks like dshbrd will launch soon. From what I’ve seen so far, it looks like it’ll be win.

tsocks: a nifty utility now working on OS X

April 25th, 2009

tsocks is a cool Linux utility. Using LD_PRELOAD, it intercepts calls to the OS’s socket-related functions (connect() and co.), and transparently tunnels them through a SOCKS proxy. Example usage:

$ curl http://www.whatismyip.com/automation/n09230945.asp
89.141.232.202
$ tsocks curl http://www.whatismyip.com/automation/n09230945.asp
159.29.64.14

As it happens, curl supports SOCKS proxies, but tsocks allows you to add support to programs that know nothing about them (like, say, wget).

Sadly, it’s no longer maintained.

Marc Abramowitz got it working on OS X (patch) back in 2006 by switching to DYLD_INSERT_LIBRARIES, among other things, but even this port has succumbed to bit-rot.

So I fixed it up, and the code now lives at github.

Using Mathematica to generate Web 2.0 company names

April 10th, 2009

Feel like calling your company something like Cashcoup, Feebany, Bunkapps, Morpone, Realance or Afative? Combining CrunchBase, Mathematica and stochastic matrices yields the Web 2.0 Company Name Generator:

In[105]:=

mathematica-names_1.gif

In[90]:=

mathematica-names_2.gif

In[94]:=

mathematica-names_3.gif

In[95]:=

mathematica-names_4.gif

In[117]:=

mathematica-names_5.gif

In[98]:=

mathematica-names_6.gif

In[106]:=

mathematica-names_7.gif

Out[106]=

mathematica-names_8.gif

In[121]:=

mathematica-names_9.gif

Out[121]=

mathematica-names_10.gif

My first attempt at automated name generation used a few Gutenberg books, which yielded appropriately Victorian-sounding names. CrunchBase seems to work better. If you want to experiment with the code, download the notebook.

Update: you can now use the name generator interactively.

Wikipedia app: Steven Troughton-Smith joins the crew

January 1st, 2009

Steven Troughton-Smith is a very talented iPhone developer from Dublin, with a bunch of cool apps under his belt. His blog is full of useful iPhone tidbits (Using Dynamic Library Injection with the iPhone Simulator, On Speed, Development & Design).

Anyway, the cool news is that he’s now going to be spending some time hacking on the offline Wikipedia iPhone app. Stay tuned to see what cool features he cooks up.

Leopard and Back To My Mac tunnels

December 4th, 2008

Back To My Mac seemed like a neat feature when Steve demoed it back at WWDC 07, but very little attention seems to have been paid to it since.

Remote NAT traversal for screen sharing and AFP is cool and all, but the most useful part is hardly mentioned anywhere: Back To My Mac can automatically establish on-the-fly tunnels to any machine with Back To My Mac enabled. You can just ssh foobar.joebloggs.members.mac.com, or curl something directly from the web server, or whatever. So long as you can make outgoing connections, it should work around any routers, firewalls, and other wrinkles in the network topology.

The catch is that it only works over IPv6. sshd on OS X has IPv6 enabled by default, as does Apache, but a lot of other stuff doesn’t.

I haven’t figured out how it works yet. It’s definitely not a straight IPv6 tunnel—the source IP of any connection is a private address (which kinda seems to defeat the purpose of using IPv6 in the first place). Any info or pointers appreciated.

Update: In the comments, JH points out that it’s not a private address, but an RFC 4193 unique local address.

Dynamic Default.png files on the iPhone

November 8th, 2008

John Gruber writes:

I’ve seen third-party iPhone developers complaining that this trick is only available to Apple; they want to use it too. The technical reason why they can’t is that because application bundles are cryptographically signed, you can’t modify the contents of the application bundle (by, in this case, changing the default.png resource file) without breaking the digital signature. Apple could enable this feature for signed applications by providing for a way to specify a dynamic default.png that exists outside the application bundle, somewhere in the application’s private Library folder.

With a bit of hackery, it turns out that you can actually create dynamic Default.png files that don’t cause problems. Here’s a demo of it in action:





This is possible because OS X’s codesign binary (I’ve had far too many run-ins with it while writing the offline Wikipedia browser), used to sign and verify bundles, doesn’t traverse symlinks:

$ codesign -vv Rememberer.app
Rememberer.app: valid on disk
$ touch Rememberer.app/test
$ codesign -vv Rememberer.app
Rememberer.app: a sealed resource is missing or invalid
/Users/patrick/Projects/Rememberer/build/Debug-iphoneos/Rememberer.app/test: resource added
$ rm Rememberer.app/test
$ codesign -vv Rememberer.app
Rememberer.app: valid on disk
$ ls -l Rememberer.app/randomfile
lrwxr-xr-x 1 patrick staff 24 8 Nov 17:21 Rememberer.app/randomfile -> ../Documents/randomfile
$ dd if=/dev/random of=Documents/randomfile count=1
1+0 records in
1+0 records out
512 bytes transferred in 0.000095 secs (5382165 bytes/sec)
$ codesign -vv Rememberer.app
Rememberer.app: valid on disk

This is somewhat understandable; the symlink itself doesn’t change. But if “randomfile” is instead something like “Default.png”, the OS will happily load it from the default path in the application bundle—and follow the symlink—even though the file is actually stored in an area (Documents) that’s dynamically modifiable.

I’m guessing that Apple will consider this a bug, and fix it in some future version of the OS. If that happens, though, the downside will probably be nothing worse than losing your dynamic Default.png.

To get it to work in Xcode, you can just add a Run Script phase to the Target:

ln -sf ../Documents/Default.png $TARGET_BUILD_DIR/$CONTENTS_FOLDER_PATH

Here’s the Xcode project for the above demo. (Code is public domain.)

Update (Nov 19): TechCrunch pointed out some wider implications of this vulnerability. Although the article was met with some skepticism, they’re basically right. There’s a good summary of the situation on the McAfee Avert Labs Blog.

iPhone hackery: API Explorer

October 29th, 2008

I wrote the offline Wikipedia browser back before there was any official iPhone SDK documentation (or SDK, for that matter), and figuring out the APIs was a bit of a challenge. So in trying to get a handle on things, I wrote an API explorer for showing a rough outline of the system’s classes. It started out as a bare-bones script, and since then I’ve gradually bolted various bits on to it.

Unlike many compiled languages, Objective-C supports pretty powerful runtime introspection. The explorer uses this to present the implemented protocols, methods and instance variables of every loaded class. In addition, if the class responds to initWithFrame: (these are usually subclasses of UIView), you can draw and resize an instance, to get a basic feel for what it does.

It’s all more easily explained with a short screencast:





If you want to play around with it (it works in both the simulator and on the devices themselves), you can download the code.

Worth remembering

October 20th, 2008

Economic theory suggests that financial innovation must lead to failures. And, in particular, since successful innovations are hard to predict, the infrastructure necessary to support innovation needs to lag the innovations themselves, which increases the probability that controls will be insufficient at times to prevent breakdowns in governance mechanisms. Failures, however, do not lead to the conclusion that re-regulation will succeed in stemming future failures. Or that society will be better off with fewer freedoms. Although governments are able to regulate organisational forms, they are unable to regulate the services provided by competing entities, many yet to be born. Organisational forms change with financial innovations. Although functions of finance remain static and are similar in Africa, Asia, Europe and the United States, their provision is dynamic as entities attempt to profit by providing services at lower cost and greater benefit than competing alternatives.

—Myron Scholes (yeah, that Scholes), debating Joseph Stiglitz.

Wikipedia iPhone redux

October 19th, 2008

Back at the start of the year, I blogged about an app I wrote that allows you to store a complete copy of Wikipedia on an iPhone/iPod Touch.

The app got more attention than I expected, with tens of thousands of downloads in the first month, which I think made it one of the more popular apps for the jailbroken iPhone. (Not anticipating any of this, the non-existent documentation and installer ensured many were confused, and so someone made a YouTube installation tutorial that has over 57,000 views at time of writing. I’m not sure if that’s good or bad.)

I also released the app’s source code, and it’s been pretty fun to work with a lot of talented people in improving it. The OLPC crew took an interest in it, and thanks to some cool work from Chris Ball and Wade Brainerd, the iPhone application was ported to the XO laptop. Chris announced in June that:

We’re going to be shipping the result to Peru on tens of thousands of laptops in the near future, and it should go up to hundreds of thousands if the other South American countries with OLPC deployments decide to include it in their builds too.

When the iPhone 3G was announced, I didn’t originally intend to port the application to the new version of the OS. The original app was a short Christmas project, and now that I’m working at Live Current, I don’t have much spare time to hack. But after a few hundred emails enquiring about a new version, I eventually felt too guilty not to. So I spent a weekend porting it to iPhone OS 2.0, added a handful of new features, and I’m happy to say that the end result is now available in the App Store.