If I look back at the end of the 90s, when I started work in this delightful web/digital sector, I already used web analytics without being aware of it (because the term “web analytics” was not in use), by regularly consulting tools such as Webalizer (my favourite) or AWStats. They were primitive in terms of graphics, but they provided valuable information on access to websites. They were tools for showing the data from the server log where the website was hosted. Basic data, but data nonetheless.
In the early 2000s I came across Urchin, a more powerful web analytics tool with greater capacity than Webalizer or AWStats. Google bought Urchin in 2005 to create what we now know as Google Analytics, which launched at the end of 2005. I spent an inordinate amount of time working with Google Analytics in 2006, but also with tools such as AWStats and Webalizer, and I was always asked the same question in the companies that used them, “Ricardo, why does Google Analytics give me some visits, and AWStats give me others? Which is correct?” The answer is easy enough. They are both right.
What do the log-based web analytics tools actually measure?
Easy. As the drawing at the top of this post shows, log-based web analytics tools gather information on log activity and display it as a graph or structures in data tables. A log file is a file generated by a server that contains a register of all requests (called hits) that it receives. Literally all requests, which means any type of file that forms part of data group that is required in order to “see” a website, and not just HTML files. The data received is stored anonymously with details such as the time and date of the request, the IP that sent it, the URL requested and browser’s user-agent.
Image of a log file. This is how good it looks
In basic terms, a log is a register of a server’s activity, so web analytics tools based on this system will show data about the files and requests that are sent to the host.
So they all measure the same thing, right?
So what is the difference? I don’t get it.
First of all, it must be said that these measuring systems do not replace, but supplement each other. They do not measure the same thing, so they complement each other, and should be used depending on the type of information that we need at any specific time, the problem we want to address or the inefficiency we want to remedy.
Log-based web analytics tools will not provide valuable information about, for example:
- Bots visiting our site
- Requests for files “hanging” from our website, like PDFs or documents.
- Errors on the site linked to HTTP Status.
- Technical details about access to the website: IPs, etc.
- Especially when the data can be reprocessed as many times as we want, because it is 100% raw.
Having said that, it should be clear that both systems can and should co-exist and be used depending on the needs that the growth of a digital project might have. This is the best approach, because a single measurement system is a mistake as it only offers us a partial view of the website.
In fact, if I had to choose only one system to use for measurement, I would probably choose the logs; which are harder to use but much more reliable overall. We have both approaches, though, so why should we choose?