How log file analysis works with GoAccess
Your web server’s log files tell you almost everything you need to know about the background and behavior of your visitors. By inspecting the log file, you can find out which browsers your visitors use, how long they stay on your site, how many pages they access, and which search engine or links brought them to your site. All this information makes the log file a first-class source for verifying user-friendliness and optimizing your web project. Since it’s impossible to evaluate these extensive text files manually, there are various log file analysis tools (log file analyzers), which perform this task and display the results visually. An interesting representative of these analyzers is the open source tool, GoAccess.
The basics of GoAccess
The developer, Gerardo Orellana, published the first version of the log file analysis tool, GoAccess, in July 2010. Even today, he still manages it and continues to develop it on the GitHub platform. GoAccess can be used as free software and adapted to fit your own ideas. It was initially under the GNU license, but since 2016 it’s been under the MIT license.
The basic idea of GoAccess is for it to analyze and visually present web statistics in real time. To be able to do this, the log file analyzer evaluates the various log file formats of web servers and cloud services such as Apache, nginx, Amazon S3, and CloudFront, and presents the results in graph form on the dashboard. These can be accessed in Unix-like systems either via browser or via the command line. Alternatively, the statistics can be issued in HTML, JSON, or CSV format.
GoAccess has minimal system requirements and, since it is written in the C programming language, only requires the C program library, ncurses, as a pre-requisite. To use the log file analysis tool on a Windows operating system, you need the Cygwin toolkit, which you can use to run certain Linux application on Microsoft systems.
The characteristic features of the open source tool
You don’t need to configure anything in order to use GoAccess. You simply select the log file that you want to analyze, start the scan, and the information will then be conveniently displayed in real-time. The various data is listed in individual categories, where values for individual measurement periods as well as values for the entire review period are shown. These listings are sorted chronologically by default, but you can also organize the data according to the number of page views or visitors, the bandwidth consumed, or the time needed for the site to load (total, average, or maximum). Some values can also be displayed in bar charts or curve diagrams. In addition to the up-to-date information, GoAccess provides you with a summary of all previously evaluated log data under 'Overall Analyzed Requests'.
Both the terminal and the browser dashboard show the aforementioned categories and diagrams in an appealing and user-friendly manner so that you can quickly draw conclusions about visitors and your website. The following table reveals the different areas covered by the log file analyzer and summarizes the findings that can be extracted from the values.
|Category||Decisive values||Significance for web analysis|
|Unique visitors per day – including spiders||Views, visitors, dates (data)||A unique visitor is understood to mean all views occuring from the same IP address. By watching the number of visitors over a longer period of time, you can see if campaigns or new content are successful.|
|Requested Files (URLs)||Views, bandwidth, loading time (Avg., Cum., Max. T.S.), URL (data)||This category provides an overview of the most requested URLs. Here you can find out which pages of your web project are particularly popular, how much bandwidth is consumed, and how stable the loading time of each page is.|
|Static Requests||Views, bandwidth, loading time, file (data)||As in the previous case, this is also about files, but only static content like images, icons, or layout elements.|
|Not Found URLs (404s)||Views, URL (data)||The URLs listed in this category have led visitors to a 404 error. You can use this statistic to identify and resolve network problems or incorrect links. The latter has a negative effect on users as well as search engines.|
|Visitor Hostnames and IPs||City, country, hostname, IP (data)||In this section, you will find information on the provider and IP address of your visitors. GoAccess even provides data on country of origin and location. The findings enable personalized content to be presented to users.|
|Operating Systems||Views, visitors, operating system (data)||Here you can see which operating systems your users use (sorted by frequency). You can use this data, for example, to determine exactly how high the mobile traffic volume is.|
|Browsers||Views, visitors, browser (data)||In this section, the accessing client types are presented. First and foremost, you’ll see the figures of the different browsers, but also whether crawlers are browsing your site and if so, which ones.|
|Time Distribution||Views, visitors, loading time, hour (data)||You obtain an hourly overview of visitor numbers. This way, you can determine exactly when your users are particularly active, and then advertise or publish advertisements or new content.|
|Virtual Hosts||Views, bandwidth, host (data)||If you are running multiple virtual hosts (domains, IP addresses) on your web server, you can use these statistics to filter out the resources that are putting the most strain on your server.|
|Referrers URLs||Views, URL (data)||The referrer is the information that appears in the log file showing which URL visitors used to access your site. You can use this information to filter out strong partner sites and also find out which search terms visitors use, if they came directly from a search engine.|
|Referring Sites||Views, website address (data)||Unlike with the previous statistics, you will not receive the URL, but the general website address of the originating site|
|Keyphrases from Google’s search engine||Views, search terms (data)||GoAccess offers a separate list of search queries to the referrer statistics – at least for Google. This saves you the tedious work of evaluating referrer URLs independently. The results obtained can provide useful insight for your keyword strategy.|
|Geo Location||Visitors, origin (data)||Under the heading 'Geo Location', you will find where IP addresses are geographically positioned.|
|HTTP Status Codes||Views, status code (data)||This section provides an overview of your server’s responses. You can see from the data whether your web server is working properly and whether all content can be accessed without errors.|
How to install and use GoAccess
To make sure you install the latest version of GoAccess, download the installation file from the official website. Using the command line, the download and installation works as follows:
$ wget http://tar.goaccess.io/goaccess-1.0.tar.gz $ tar -xzvf goaccess-1.0.tar.gz $ cd goaccess-1.0/ $ ./configure --enable-utf8 $ make # make install
Do not forget that ncurses is a pre-requisite for nginx and Apache log analyzer so the latest version should, therefore, be installed on your system. If you haven’t already done so, you can set up the C library using the following code:
$ wget http://ftp.gnu.org/pub/gnu/ncurses/ncurses-5.7.tar.gz $ tar xzf ncurses-6.0.tar.gz $ cd ncurses-6.0 $ ./configure --prefix=/opt/ncurses $ make # make install $ ls -la /opt/ncurses