This week's lecture dealt with Web Analytics. Web Analytics involves collecting data about visitors who come to your website and what they do on your website, and then analyzing that data in meaningful ways.
Specifically, we looked at Google Analytics (GA). GA is popular because it is easy to use: simply insert a small amount of JavaScript code into the page(s) you want to track; it is also FREE, which is probably the main reason why it's one of the leading web analytics tools.
The diagram below (from http://www.ohcpi.com/analytics.html) illustrates how GA works.
- Users connect to your website and download site content
- When their browser renders the HTML, JavaScript embedded in the page makes a call to the Google servers
- Since this call originates from the visitor's computer and not your server, it contains information (e.g. cookies) that are specific to Google's domain, as well as information about the page (URL) in which it was embedded
- Google assimilates the data it collects from these requests
- Google provides reports and tools that can be used to slice and dice the collected data
The lecture discussed the web analytics cycle, emphasizing that analytics is not a one-shot view of things but instead a continuous cycle:
- set goals -- decide what data you are going to be collecting, what you hope the results of analysis will be
- measure -- collect the data
- report -- organize the data into a format that can be analyzed
- analyze -- examine the data to determine how it measures up to your goals
- optimize -- make changes as necessary to deal with shortcomings or issues seen
- repeat!
The lecture then discussed the Five Ws of web analytics: what who where when why
- What - actions that are being performed on your website -- what links are they clicking on, etc.
- Who - the audience, demographics of the audience
- When - the time, days/hours, time of year, how long do they stay on a particular page
- Where - geographical areas of visitors
- Why - are they buying products? reading blogs? contributing reviews? all of the above?
I think that these are useful guidelines and I found them helpful when organizing the GA report I did for my client.
The lecture then discussed some of the measures available and (in the second module) went through some examples of information one can get via GA.
What About Ethics?
I think that one thing that was missing from this week's lecture was a discussion around the ethics of web analytics. At what point are we crossing the line when we analyze information about the visitors to our websites?
In one respect, one could say that when users visit your website, you have the right to track what they do and where they go. All web servers log requests, so the fact that someone is accessing the website and what they download (everything including HTML files, JavaScript, images, and documents) is recorded in the web server log.
However, GA (and other analytics tools) goes one step further -- they have you add JavaScript code to your page(s) and then the visitor's browser makes a call to the Google servers. For security reasons, web browsers are required to only pass cookies to the site from which they originate, but this JavaScript trick is a way to sneak around that "limitation" -- the call goes from the user's web browser to Google's server, so while Google doesn't get any cookies that your site may have set (those are private), it does get cookies that Google has set in some other connection. The returning data also sets cookies in the web browser that belong to Google. So that means that cookies that are set because you did Google searches, or logged into your Gmail account, or went to another site that uses GA are all sent along to Google. Of course, these cookies are not "Google Search Was Here" human-readable pieces of information, but instead are hexadecimal strings that only have meaning to entries in the Google data server.
For example, using a web developer plugin, one can easily see the cookies set for the google.com domain. All of these cookies would be sent to GA when you navigate to a page with the GA JavaScript entry.
What this means is that Google can track a person's behavior across many sites on the Internet. What do they do with this data? We can glean a little information by looking at Google's privacy policy (http://www.google.com/intl/en/policies/privacy), but in reality what they are saying is "trust us".
Ultimately, if one is collecting PII (personally identifiable information) then a line has probably been crossed (at least in some countries).
The following cartoon (from Measuring Success, 2013) compares internet traffic to traffic on a road and the equivalents of collecting information.
This highlights a big difference between doing business online in the European Union (EU) versus the US: the EU has strong privacy laws that require that sites get consent from users before collecting much more than the basic, non-individual data. The data that can be collected without consent is basically the same data you can get by analyzing the web server logs.
So what can we do? What should we do? Well, as individuals, you can choose to mess up GA (or other analytics) by deleting cookies...
You don't have to delete all your cookies, you can choose to selectively delete cookies from particular domains (but this can end up being a lot of work). Personally, I delete cookies pretty regularly and it's just become a habit of mine. The downside to deleting cookies is that you can lose things like "remember my login" on pages you frequent. Bank of America is always asking for my state and for me to confirm my computer!
As designers/architects of web sites, or helping guide those who are, we have to decide if we want to be part of the big information gathering machine that is Google (or, again, any other analytics system, not meaning to pick on Google).