Configure Web Logs in Apache

One of the many pieces of the Website puzzle is Web logs. Traffic analysis is central to most Websites, and the key to getting the most out of your traffic analysis revolves around how you configure your Web logs.

Apache is one of the most — if not the most — powerful open source solutions for Website operations. You will find that Apache’s Web logging features are flexible for the single Website or for managing numerous domains requiring Web log analysis.

Author’s Note: While most of this piece discusses configuration options for any operating system Apache supports, some of the content will be Unix/Linux (*nix) specific, which now includes Macintosh OS X and its underlying Unix kernel.

For the single site, Apache is pretty much configured for logging in the default install. The initial httpd.conf file (found in /etc/httpd/conf/httpd.conf in most cases) should have a section on logs that looks similar to this (Apache 2.0.x), with descriptive comments for each item. Your default logs folder will be found in /etc/httpd/logs. This location can be changed when dealing with multiple Websites, as we’ll see later. For now, let’s review this section of log configuration.

ErrorLog logs/error_log

LogLevel warn

LogFormat “%h %l %u %t “%r” %>s %b “%{Referer}i” “%{User-Agent}i”” combined

LogFormat “%h %l %u %t “%r” %>s %b” common

LogFormat “%{Referer}i -> %U” referer

LogFormat “%{User-agent}i” agent

CustomLog logs/access_log combined

Error Logs

The error log contains messages sent from Apache for errors encountered during the course of operation. This log is very useful for troubleshooting Apache issues on the server side.

Apache Log Tip: If you are monitoring errors or testing your server, you can use the command line to interactively watch log entries. Open a shell session and type "tail –f /path/to/error_log". This will show you the last few entries in the file and also continue to show new entries as they occur.

There are no real customization options available, other than telling Apache where to establish the file, and what level of error logging you seek to capture. First, let’s look at the error log configuration code from httpd.conf.

ErrorLog logs/error_log

You may wish to store all error-related information in one error log. If so, the above is fine, even for multiple domains. However, you can specify an error log file for each individual domain you have. This is done in the <VirtualHost> container with an entry like this:

<VirtualHost 10.0.0.2>

DocumentRoot "/home/sites/domain1/html/"

ServerName domain1.com

ErrorLog /home/sites/domain1/logs/error.log

</VirtualHost>

If you are responsible for reviewing error log files as a server administrator, it is recommended that you maintain a single error log. If you’re hosting for clients, and they are responsible for monitoring the error logs, it’s more convenient to specify individual error logs they can access at their own convenience.

The setting that controls the level of error logging to capture follows below.

LogLevel warn

Apache’s definitions for their error log levels are as follows:

1299_apachelogstable1 (click to view image)

Tracking Website Activity

Often by default, Apache will generate three activity logs: access, agent and referrer. These track the accesses to your Website, the browsers being used to access the site and referring urls that your site visitors have arrived from.

It is commonplace now to utilize Apache’s “combined” log format, which compiles all three of these logs into one logfile. This is very convenient when using traffic analysis software as a majority of these third-party programs are easiest to configure and schedule when only dealing with one log file per domain.

Let’s break down the code in the combined log format and see what it all means.

LogFormat "%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i"" combined

LogFormat starts the line and simply tells Apache you are defining a log file type (or nickname), in this case, combined. Now let’s look at the cryptic symbols that make up this log file definition.

1299_apachelogstable2 (click to view image)

To review all of the available configuration codes for generating a custom log, see Apache’s docs on the module_log_config, which powers log files in Apache.

Apache Log Tip: You could capture more from the HTTP header if you so desired. A full listing and definition of data in the header is found at the World Wide Web Consortium.

For a single Website, the default entry would suffice:

CustomLog logs/access_log combined

However, for logging multiple sites, you have a few options. The most common is to identify individual log files for each domain. This is seen in the example below, again using the log directive within the <VirtualHost> container for each domain.

<VirtualHost 10.0.0.2>

DocumentRoot "/home/sites/domain1/html/"

ServerName domain1.com

ErrorLog /home/sites/domain1/logs/error.log

CustomLog /home/sites/domain1/logs/web.log

</VirtualHost>

<VirtualHost 10.0.0.3>

DocumentRoot “/home/sites/domain2/html/”

ServerName domain2.com

ErrorLog /home/sites/domain2/logs/error.log

CustomLog /home/sites/domain2/logs/web.log

</VirtualHost>

<VirtualHost 10.0.0.4>

DocumentRoot “/home/sites/domain3/html/”

ServerName domain3.com

ErrorLog /home/sites/domain3/logs/error.log

CustomLog /home/sites/domain3/logs/web.log

</VirtualHost>

In the above example, we have three domains with three unique Web logs (using the combined format we defined earlier). A traffic analysis package could then be scheduled to process these logs and generate reports for each domain independently.

This method works well for most hosts. However, there may be situations where this could become unmanageable. Apache recommends a special single log file for large virtual host environments and provides a tool for generating individual logs per individual domain.

We will call this log type the cvh for
mat, standing for “common virtual host.” Simply by adding a %v (which stands for virtual host) to the beginning of the combined log format defined earlier and giving it a new nickname of cvh, we can compile all domains into one log file, then automatically split them into individual log files for processing by a traffic analysis package.

LogFormat "%v %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i"" cvh

In this case, we do not make any CustomLog entries in the <VirtualHost> containers and simply have one log file generated by Apache. A program created by Apache called split_logfile is included in the src/support directory of your Apache sources. If you did not compile from source or do not have the sources, you can get the Perl script.

The individual log files created from your master log file will be named for each domain (virtual host) and look like: virtualhost.log.

Log Rotation

Finally, we want to address log rotation. High traffic sites will generate very large log files, which will quickly swallow up valuable disk space on your server. You can use log rotation to manage this process.

There are many ways to handle log rotation, and various third-party tools are available as well. However, we’re focusing on configurations native to Apache, so we will look at a simple log rotation scheme here. I’ll include links to more flexible and sophisticated log rotation options in a moment.

This example uses a rudimentary shell script to move the current Web log to an archive log, compresses the old file and keeps an archive for as long as 12 months, then restarts Apache with a pause to allow the log files to be switched out.

mv web11.tgz web12.tgz

mv web10.tgz web11.tgz

mv web9.tgz web10.tgz

mv web8.tgz web9.tgz

mv web7.tgz web8.tgz

mv web6.tgz web7.tgz

mv web5.tgz web6.tgz

mv web4.tgz web5.tgz

mv web3.tgz web4.tgz

mv web2.tgz web3.tgz

mv web1.tgz web2.tgz

mv web.tgz web1.tgz

mv web.log web.old

/usr/sbin/apachectl graceful

sleep 300

tar cvfz web.tgz web.old

This code can be copied into a file called logrotate.sh, and placed inside the folder where your web.log file is stored (or whatever you name your log file, e.g. access_log, etc.). Just be sure to modify for your log file names and also chmod (change permissions on the file) to 755 so it becomes an executable.

This works fine for a single busy site. If you have more complex requirements for log rotation, be sure to see some of the following sites. In addition, many Linux distributions now come with a log rotation included. For example, Red Hat 9 comes with logrotate.d, a log rotation daemon which is highly configurable. To find out more, on your Linux system with logrotate.d installed, type man logrotate.

Apache Module mod_log_config

Apache Module mod_log_config

Summary

This module provides for flexible logging of client requests. Logs are written in a customizable format, and may be written directly to a file, or to an external program. Conditional logging is provided so that individual requests may be included or excluded from the logs based on characteristics of the request.

Three directives are provided by this module: TransferLog to create a log file, LogFormat to set a custom format, and CustomLog to define a log file and format in one step. The TransferLog and CustomLog directives can be used multiple times in each server to cause each request to be logged to multiple files.

Custom Log Formats

The format argument to the LogFormat and CustomLog directives is a string. This string is used to log each request to the log file. It can contain literal characters copied into the log files and the C-style control characters “n” and “t” to represent new-lines and tabs. Literal quotes and back-slashes should be escaped with back-slashes.

The characteristics of the request itself are logged by placing “%” directives in the format string, which are replaced in the log file by the values as follows:

Format String Description
%% The percent sign (Apache 2.0.44 and later)
%...a Remote IP-address
%...A Local IP-address
%...B Size of response in bytes, excluding HTTP headers.
%...b Size of response in bytes, excluding HTTP headers. In CLF format, i.e. a ‘-‘ rather than a 0 when no bytes are sent.
%...{Foobar}C The contents of cookie Foobar in the request sent to the server.
%...D The time taken to serve the request, in microseconds.
%...{FOOBAR}e The contents of the environment variable FOOBAR
%...f Filename
%...h Remote host
%...H The request protocol
%...{Foobar}i The contents of Foobar: header line(s) in the request sent to the server.
%...l Remote logname (from identd, if supplied). This will return a dash unless IdentityCheck is set On.
%...m The request method
%...{Foobar}n The contents of note Foobar from another module.
%...{Foobar}o The contents of Foobar: header line(s) in the reply.
%...p The canonical port of the server serving the request
%...P The process ID of the child that serviced the request.
%...{format}P The process ID or thread id of the child that serviced the request. Valid formats are pid and tid. (Apache 2.0.46 and later)
%...q The query string (prepended with a ? if a query string exists, otherwise an empty string)
%...r First line of request
%...s Status. For requests that got internally redirected, this is the status of the *original* request — %...>s for the last.
%...t Time the request was received (standard english format)
%...{format}t The time, in the form given by format, which should be in strftime(3) format. (potentially localized)
%...T The time taken to serve the request, in seconds.
%...u Remote user (from auth; may be bogus if return status (%s) is 401)
%...U The URL path requested, not including any query string.
%...v The canonical ServerName of the server serving the request.
%...V The server name according to the UseCanonicalName setting.
%...X Connection status when response is completed:

X = connection aborted before the response completed.
+ = connection may be kept alive after the response is sent.
- = connection will be closed after the response is sent.

(This directive was %...c in late versions of Apache 1.3, but this conflicted with the historical ssl %...{var}c syntax.)

%...I Bytes received, including request and headers, cannot be zero. You need to enable mod_logio to use this.
%...O Bytes sent, including headers, cannot be zero. You need to enable mod_logio to use this.

The “” can be nothing at all (e.g., "%h %u %r %s %b"), or it can indicate conditions for inclusion of the item (which will cause it to be replaced with “-” if the condition is not met). The forms of condition are a list of HTTP status codes, which may or may not be preceded by “!”. Thus, “%400,501{User-agent}i” logs User-agent: on 400 errors and 501 errors (Bad Request, Not Implemented) only; “%!200,304,302{Referer}i” logs Referer: on all requests which did not return some sort of normal status.

The modifiers “<” and “>” can be used for requests that have been internally redirected to choose whether the original or final (respectively) request should be consulted. By default, the % directives %s, %U, %T, %D, and %r look at the original request while all others look at the final request. So for example, %>s can be used to record the final status of the request and %<u can be used to record the original authenticated user on a request that is internally redirected to an unauthenticated resource.

Note that in httpd 2.0 versions prior to 2.0.46, no escaping was performed on the strings from %...r, %...i and %...o. This was mainly to comply with the requirements of the Common Log Format. This implied that clients could insert control characters into the log, so you had to be quite careful when dealing with raw log files.

For security reasons, starting with 2.0.46, non-printable and other special characters are escaped mostly by using xhh sequences, where hh stands for the hexadecimal representation of the raw byte. Exceptions from this rule are " and which are escaped by prepending a backslash, and all whitespace characters which are written in their C-style notation (n, t etc).

Note that in httpd 2.0, unlike 1.3, the %b and %B format strings do not represent the number of bytes sent to the client, but simply the size in bytes of the HTTP response (which will differ, for instance, if the connection is aborted, or if SSL is used). The %O format provided by mod_logio will log the actual number of bytes sent over the network.

Some commonly used log format strings are:

Common Log Format (CLF)
"%h %l %u %t "%r" %>s %b"
Common Log Format with Virtual Host
"%v %h %l %u %t "%r" %>s %b"
NCSA extended/combined log format
"%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-agent}i""
Referer log format
"%{Referer}i -> %U"
Agent (Browser) log format
"%{User-agent}i"

Note that the canonical ServerName and Listen of the server serving the request are used for %v and %p respectively. This happens regardless of the UseCanonicalName setting because otherwise log analysis programs would have to duplicate the entire vhost matching algorithm in order to decide what host really served the request.

Security Considerations

See the security tips document for details on why your security could be compromised if the directory where logfiles are stored is writable by anyone other than the user that starts the server.

BufferedLogs Directive

Description: Buffer log entries in memory before writing to disk
Syntax: BufferedLogs On|Off
Default: BufferedLogs Off
Context: server config
Status: Base
Module: mod_log_config
Compatibility: Available in versions 2.0.41 and later.

The BufferedLogs directive causes mod_log_config to store several log entries in memory and write them toget
her to disk, rather than writing them after each request. On some systems, this may result in more efficient disk access and hence higher performance. It may be set only once for the entire server; it cannot be configured per virtual-host.

This directive is experimental and should be used with caution.

CookieLog Directive

Description: Sets filename for the logging of cookies
Syntax: CookieLog filename
Context: server config, virtual host
Status: Base
Module: mod_log_config
Compatibility: This directive is deprecated.

The CookieLog directive sets the filename for logging of cookies. The filename is relative to the ServerRoot. This directive is included only for compatibility with mod_cookies, and is deprecated.

CustomLog Directive

Description: Sets filename and format of log file
Syntax: CustomLog file|pipe format|nickname [env=[!]environment-variable]
Context: server config, virtual host
Status: Base
Module: mod_log_config

The CustomLog directive is used to log requests to the server. A log format is specified, and the logging can optionally be made conditional on request characteristics using environment variables.

The first argument, which specifies the location to which the logs will be written, can take one of the following two types of values:

file
A filename, relative to the ServerRoot.
pipe
The pipe character “|“, followed by the path to a program to receive the log information on its standard input.

Security:

If a program is used, then it will be run as the user who started httpd. This will be root if the server was started by root; be sure that the program is secure.

Note

When entering a file path on non-Unix platforms, care should be taken to make sure that only forward slashed are used even though the platform may allow the use of back slashes. In general it is a good idea to always use forward slashes throughout the configuration files.

The second argument specifies what will be written to the log file. It can specify either a nickname defined by a previous LogFormat directive, or it can be an explicit format string as described in the log formats section.

For example, the following two sets of directives have exactly the same effect:

# CustomLog with format nickname

LogFormat "%h %l %u %t "%r" %>s %b" common

CustomLog logs/access_log common

# CustomLog with explicit format string

CustomLog logs/access_log “%h %l %u %t “%r” %>s %b”

The third argument is optional and controls whether or not to log a particular request based on the presence or absence of a particular variable in the server environment. If the specified environment variable is set for the request (or is not set, in the case of a ‘env=!name‘ clause), then the request will be logged.

Environment variables can be set on a per-request basis using the mod_setenvif and/or mod_rewrite modules. For example, if you want to record requests for all GIF images on your server in a separate logfile but not in your main log, you can use:

SetEnvIf Request_URI .gif$ gif-image

CustomLog gif-requests.log common env=gif-image

CustomLog nongif-requests.log common env=!gif-image

Or, to reproduce the behavior of the old RefererIgnore directive, you might use the following:

SetEnvIf Referer example.com localreferer

CustomLog referer.log referer env=!localreferer

LogFormat Directive

Description: Describes a format for use in a log file
Syntax: LogFormat format|nickname [nickname]
Default: LogFormat "%h %l %u %t "%r" %>s %b"
Context: server config, virtual host
Status: Base
Module: mod_log_config

This directive specifies the format of the access log file.

The LogFormat directive can take one of two forms. In the fi
rst form, where only one argument is specified, this directive sets the log format which will be used by logs specified in subsequent TransferLog directives. The single argument can specify an explicit format as discussed in the custom log formats section above. Alternatively, it can use a nickname to refer to a log format defined in a previous LogFormat directive as described below.

The second form of the LogFormat directive associates an explicit format with a nickname. This nickname can then be used in subsequent LogFormat or CustomLog directives rather than repeating the entire format string. A LogFormat directive that defines a nickname does nothing else — that is, it only defines the nickname, it doesn’t actually apply the format and make it the default. Therefore, it will not affect subsequent TransferLog directives. In addition, LogFormat cannot use one nickname to define another nickname. Note that the nickname should not contain percent signs (%).

Example

LogFormat "%v %h %l %u %t "%r" %>s %b" vhost_common

TransferLog Directive

Description: Specify location of a log file
Syntax: TransferLog file|pipe
Context: server config, virtual host
Status: Base
Module: mod_log_config

This directive has exactly the same arguments and effect as the CustomLog directive, with the exception that it does not allow the log format to be specified explicitly or for conditional logging of requests. Instead, the log format is determined by the most recently specified LogFormat directive which does not define a nickname. Common Log Format is used if no other format has been specified.

Example

LogFormat "%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-agent}i""

TransferLog logs/access_log

Web stats- Urchin

This software is used to track the logs of IIS and apache and displays reports.

  • In-house Flexibility: Configure Urchin to fit your specific requirements and process/reprocess log files as frequently as you wish.
  • Great for intranets: Analyze firewall-protected content, such as corporate intranets, without any outside internet connection.
  • Pagetags or IP+User Agent: Choose which methodology works best for you. You can even have the pagetags make a call to your Google Analytics account and run both products together allowing you to audit the pre and post processed data.
  • Advanced Visitor Segmentation: Cross segment visitor behavior by language, geographic location, and other factors.
  • Geo-targeting: Find out where your visitors come from and which markets have the greatest profit potential.
  • Funnel Visualization: Eliminate conversion bottlenecks and reduce the numbers of prospects who drift away unconverted.
  • Complete Conversion Metrics: See ROI, revenue per click, average visitor value and more.
  • Keyword Analysis: Compare conversion metrics across search engines and keywords.
  • A/B Test Reporting: Test banner ads, emails, and keywords and fine-tune your creative content for better results.
  • Ecommerce Analytics: Trace transactions to campaigns and keywords, get loyalty and latency metrics, and see product merchandising reports.
  • Search engine robots, server errors and file type reports: Get the stuff that only log data can report on.
  • Visitor History Drilldown: dig into visitor behavior with the ability to view session/path, platform, geo-location, browser/platform, etc. data on an individual-visitor basis (note: this data is anonymous).
Feature Urchin 6 Google Analytics
Install and manage on your own servers Yes No
Can be used on firewall-protected corporate intranets Yes No
Reprocess historical data (from logfiles) Yes No
Can process/re-process your log files locally Yes No
Can collect information through tags No Yes
Reports on robot/spider activity Yes No
Reports on server errors/status codes Yes No
Tightly integrated with AdWords No Yes
Can report on paid search campaigns Yes Yes
Ecommerce/Conversion reporting Yes Yes
Geotargeting Yes Yes
Free No Yes
Visitor session/navigation path analyses Yes No
Raw data accessible for custom report-building Yes No
Exclusively supported by authorized consultants Yes No

http://www.urchin.com

Web Statistics -AWStats

AWStats is short for Advanced Web Statistics. AWStats is powerful log analyzer which creates advanced web, ftp, mail
and streaming server statistics reports based on the rich data contained in server logs. Data is graphically presented in
easy to read web pages.
Designed with flexibility in mind, AWStats can be run through a web browser CGI (common gateway interface) or directly
from the operating system command line. Through the use of intermediary data base files, AWStats is able to quickly
process large log files, as often desired. With support for both standard and custom log format definitions, AWStats can
analyze log files from Apache (NCSA combined/XLF/ELF or common/CLF log format), Microsoft’s IIS (W3C log format),
WebStar and most web, proxy, wap and streaming media servers as well as ftp and mail server logs.

AWStats’ reports include a wide range of information on your web site usage:
* Number of Visits, and number of Unique visitors.
* Visit duration and latest visits.
* Authenticated Users, and latest authenticated visits.
* Usage by Months, Days of week and Hours of the day (pages, hits, KB).
* Domains/countries (and regions, cities and ISP with Maxmind proprietary geo databases) of visitor’s hosts (pages, hits, KB,
269 domains/countries detected).
* Hosts list, latest visits and unresolved IP addresses list.
* Most viewed, Entry and Exit pages.
* Most commonly requested File types.
* Web Compression statistics (for Apache servers using mod_gzip or mod_deflate modules).
* Visitor’s Browsers (pages, hits, KB for each browser, each version, 123 browsers detected: Web, Wap, Streaming Media
browsers…, around 482 with the “phone browsers” database).
* Visitor’s Operating Systems (pages, hits, KB for each OS, 45 OS detected).
* Robots visits, including search engine crawlers (381 robots detected).
* Search engines, Keywords and Phrases used to find your site (The 122 most famous search engines are detected like
Yahoo, Google, Altavista, etc…)
* HTTP Errors (Page Not Found with latest referrer, …).
* User defined reports based on url, url parameters, referrer (referer) fields extend AWStats’ capabilities to provide even
greater technical and marketing information.
* Number of times your site is added to Bookmarks / Favorites.
* Screen size (to capture this, some HTML tags must be added to a site’s home page).
* Ratio of integrated Browser Support for: Java, Flash, Real G2 player, Quicktime reader, PDF reader, WMA reader (as
above, requires insertion of HTML tags in site’s home page).
* Cluster distribution for load balanced servers.
In addition, AWStats provides the following:
* Wide range of log formats. AWStats can analyze: Apache NCSA combined (XLF/ELF) or common (CLF) log files,
Microsoft IIS log files (W3C), WebStar native log files and other web, proxy, wap, streaming media, ftp and mail server log
files. See AWStats F.A.Q. for examples.
* Reports can be run from the operating system command line and from a web browser as a CGI (common gateway
interface). In CGI mode, dynamic filter capabilities are available for many charts.
* Statistics update can be run from a web browser as well as scheduled for automatic processing.
* Unlimited log file size
What is AWStats / Features Overview 2/87 13/04/2008
* Load balancing system split log files.
* Support ‘nearly sorted’ log files, even for entry and exit pages.
* Reverse DNS lookup before or during analysis; supports DNS cache files.
* Country detection from IP location (geoip) or domain name.
* Plugins for US/Canadian Regions, Cities and major countries regions, ISP and/or Organizations reports (require non free
third product geoipregion, geoipcity, geoipisp and/or geoiporg database).
* WhoIS lookup links.
* Vast array of configurable options/filters and plugins supported.
* Modular design supports inclusion of addition features via plugins.
* Multi−named web sites supported (virtual servers, great for web−hosting providers).
* Cross Site Scripting Attacks protection.
* Reports available in many international languages. See AWStats F.A.Q. for full list. Users can provide files for additional
languages not yet available.
* No need for esoteric perl libraries. AWStats works with all basic perl interpreters.
* Dynamic reports through a CGI interface.
* Static reports in one or framed HTML or XHTML pages; experimental PDF export through 3rd party “htmldoc” software.
* Customize look and color scheme to match your site design; with or without CSS (cascading style sheets).
* Help and HTML tooltips available in reports.
* Easy to use − all configuration directives are confined to one file for each site.
* Analysis database can be stored in XML format for easier use by external applications, like XSLT processing (one xslt
transform example provided).
* A Webmin module is supplied.
* Absolutely free (even for web hosting providers); source code is included (GNU General Public License).
* Works on all platforms with Perl support.
* AWStats has a XML Portable Application Description.
Requirements:
AWStats usage has the following requirements:
* You must have access to the server logs for the reporting you want to perform (web/ftp/mail).
* You must be able to run perl scripts (.pl files) from command line and/or as a CGI. If not, you can solve this by
downloading latest Perl version at ActivePerl (Win32) or Perl.com (Unix/Linux/Other).

reference : http://awstats.sourceforge.net/