Webalizer stats for multiple websites

Samuel L. Jackson

Webalizer is a web log analysis software licensed under GPL. It’s written in C and it’s super fast in processing access log files. Configuring Webalizer is very easy, especially if your Apache web server combines all access logs into one log file, but I noticed that many people have trouble configuring Webalizer for multiple websites (virtualhosts). There are a couple of ways to make this trouble disappear, but I’ll explain only one which is, in my opinion, one of the easiest and applicable in most scenarios. Although I’m using CentOS 6, this tutorial isn’t CentOS-specific. So, if you’re using another distro you’ll notice that there are some differences, but you’ll get there in the end.

The story

Let’s imagine you have Apache serving couple of websites on the same server. Each website has its own virtualhost and access log. Let’s suppose that access logs are rotated on a daily basis and that Webalizer’s stats should be generated once a day. Oh, and let’s say that you hate the thought of having gaps in the stats because of access log rotation.

Getting things done

Right, since Webalizer is available in CentOS base repo, you can install it with:

# yum install webalizer

By default, Webalizer’s configuration file is located in /etc/webalizer.conf and its daily cron job is located in /etc/cron.daily/00webalizer. In this case we won’t be needing the cron job, so you can delete it right away. On the other hand, we’ll need the configuration file, but just as a template. To make things easier and more organized I suggest that you create a new directory where you’ll put multiple Webalizer configuration files - one for each website/virtualhost.

# mkdir /etc/webalizer

Next, we’ll create a configuration file for every website. For example:

# cp /etc/webalizer.conf /etc/webalizer/ws1.example.com
# cp /etc/webalizer.conf /etc/webalizer/ws2.example.com

In configuration files that you created you can configure a lot of interesting settings (which graphs to generate, how many child processes should Webalizer use for DNS resolving etc.), but to get everything in working state you must set 5 main options:

LogFile          /var/log/httpd/ws1.example.com-access_log  # apache access log
OutputDir        /var/www/stats                             # document root for stats pages
HistoryName      /var/lib/webalizer/ws1.example.com.hist    # history file 
IncrementalName  /var/lib/webalizer/ws1.example.com.current # file for saving incremental data
HostName         ws1.example.com                            # website's FQDN

I strongly recommend that you double-check these settings in all configuration files and make sure that they differ to avoid mixing up stats data of different websites.

Gimme the stats!

You’re probably wondering how are we going generate stats if we got rid off the default cron job. Well, we’ll be using logrotate which will execute a simple bash script for generating stats just before log rotation. This is very convenient because you don’t have to worry about missing chunks of data that occur when logs are rotated before Webalizer gets ahold of them.

If you don’t have logrotate installed, you can simply install it with:

# yum install logrotate

If you configured Apache to save individual access log in /var/log/httpd/ folder, you can easily put the following logrotate configuration in /etc/logrotate.d/httpd

/var/log/httpd/*log {
    daily
    missingok
    rotate 4
    compress
    delaycompress
    notifempty
    sharedscripts
    delaycompress
    prerotate
    /root/scripts/webalizer
    endscript
    postrotate
        /sbin/service httpd reload > /dev/null 2>/dev/null || true
    endscript
}

As you can see, logs are rotated on a daily basis. Rotated logs are compressed and kept for 4 days. prerotate function calls external script right before, well, log rotation and the script looks like this:

#!/bin/bash
lockfile="/tmp/webalizer.lock"
# bail out if lock file still exists
if [ -f $lockfile ]; then
        echo "Lock file exists! Webalizer may still be crunching numbers!"
        exit 1
else
        # write the lock file
        date +"%d.%m.%Y - %H:%M" > $lockfile
        echo -e "-------------------------------------"
        echo "[`date +"%d.%m.%Y - %H:%M"`] Generating stats..."
        echo -e "-------------------------------------\n"
        # go trough config files and generate stats
        for i in /etc/webalizer/*.conf; do webalizer -c $i; done
        echo -e "\n-------------------------------------"
        echo "[`date +"%d.%m.%Y - %H:%M"`] Finished"
        echo -e "-------------------------------------\n"
        # delete the lock file
        rm -rf $lockfile
fi
exit 0

If you are using the above logrotate configuration, you should put the bash script in /root/scripts/webalizer and make it executable with

# chmod 700 /root/scripts/webalizer

Right, now you’re ready to go. If you don’t want to wait log rotation to see the results, you can always manually execute the bash script to update stats.