Monday, 6 May 2013

Splunk: Real-time (web) analytics, powerful data mining and cost effective single customer view

Splunk is a fantastic monitoring and operational intelligence tool and now we are all trained up here at Datalicious with certificates to prove it (see end of post). The most frequent use case is for systems administrators, but we set out to play around with it and see how we could use it for web analytics. We realised that we could use its powerful, expressive search language and its intuitive charting & visualisation features to do analytics work that’s more difficult, more expensive, or simply not possible, in other web analytics suites.

The big philosophy of Splunk is that you just throw all your data into it and worry about how to report on it and what to do with it later. This is great for us: it means we can focus on gathering as much data as possible in the implementation stage of a project, and there’s no risk of getting to the reporting & insights staging only to realise we’ve overlooked something.

We have a setup where all our Google Analytics data is cloned and sent into Splunk. We hacked together a simple, scaleable pixel server in node which acts as an intermediary between Google Analytics and our Splunk installation. Our server can handle any pixel request, so we can supplement the data that Google Analytics gathers with anything we want to do in our tracking code – without having to set up Custom Variables in advance, and without being limited to 5 of them.

Once the data is in Splunk, its search language lets us get right at the data and do whatever we want with it. For example, maybe we want to see how many page views our website gets on average per session, to see how our latest site design is performing. We can run this search:

    eventtype=datalicious_GA earliest=-7d | stats avg(utms) AS avg | eval avg=round(avg, 2)

Broken down, it’s pretty simple: we’re looking at the event type called “datalicious_GA”, which has been defined elsewhere. The earliest results we want are 7 days ago. We “pipe” the output of that search to the “stats” command, and we get an average of “utms”, which is Google Analytics’ session counter. We then round it to two so that it looks a bit nicer, and we get this:

average page views

Fairly simple. But what happens if we realise we want to break those results down by some kind of segmentation which we didn’t plan for in the past? It’s no problem. If at any time in the future we get some additional metadata about our visitors, we can apply that retrospectively to generate segmentations across their full history. For example, lets say some visitors eventually “convert”, which for our website is simply clicking one of the links to contact us. We could run this more complex search query:

    eventtype=”dataliciousGA” | eval type=”Non-Converter” | join type=outer datalicious [search eventtype="dataliciousGA" | join datalicious [search eventtype="datalicious_conversion"] | eval type=”Converter”] | stats avg(utms) AS avg by type | eval avg=round(avg,2)

This just means we want to do a search for converters, join it to the search result for all visitors, and show the average per-session page views of each of those segments.
 segmented average page views

It’s trivial to look at something like conversions by channel:

Of course, no one wants to look at ugly search strings all day. That’s why we build visualisations:

individually segmented page views

It’s important to emphasise that we can retrospectively apply a segmentation across the full history of all impressions, events and custom data at any time. In the above example, we built a little form and got people from around the office to fill in their name. We associated that with the unique cookie ID they have on our website, and suddenly we can track their individual behaviour over all time. This didn’t have to be the name, it could have been any meaningful segmentation: annual household income, country, favourite musical genre, etc.

And of course, we can apply all of those segmentations across data like search keywords:

Source: http://blog.datalicious.com/splunk-blog-post-for-review/

1 comment: