The Book of
Postfix

Chapter 25. Troubleshooting Postfix

This chapter contains tips for several different trouble areas, including the system logger, configuration issues, network oddities, and general system issues. As with any kind of troubleshooting, when you're having trouble with Postfix, you need to an idea of where the problem is before you can fix it. This is especially true with Postfix, which has several separate subsystems.

1. Problems Starting Postfix and Viewing the Log

The most "obvious" problem first is that Postfix might not even be running. Postfix must be running, even if you're only submitting mail using the sendmail command. The easiest way to find out if Postfix is running is to run postfix start:

# postfix start
postfix/postfix-script: starting the Postfix mail system

If you see this message, it means that Postfix wasn't running, so the command tried to start it up. However, if Postfix is already running, you'll get this messages:

# postfix start
postfix/postfix-script: fatal: the Postfix mail system is already running

When running this command, you should see similar messages in your mail log, such as these:

Jul  5 22:49:29 mail postfix/postfix-script: starting the Postfix mail system
Jul  5 22:49:29 mail postfix/master[14835]: daemon started -- version 2.1.3

If you don't see these messages, check your syslog configuration immediately. You want to make sure that it logs mail.*; complete logs are essential for any kind of comprehensive troubleshooting. We recommend consolidating all syslog entries for the mail facility (or whichever facility Postfix is configured for) into one log file. Some installations (such as the one in Debian GNU/Linux) splitting the log into multiple files, but this makes reading the log very tedious. To set it up for easy viewing, your /etc/syslog.conf should have an entry like this:

# (- Log all the mail messages to one place.)
mail.*                                                -/var/log/maillog

Let's say that you see the messages on the command line and log, but you still wonder if Postfix is actually running. Sometimes it pays to be paranoid, because Postfix can start and crash immediately afterward if there's a serious problem. Use ps and grep to see if the Postfix component daemons are really running. If it's working, the command execution should look like this:

# ps aux|grep postfix
root      5035  0.0  0.4  2476 1100 ?        S    09:29   0:00 /usr/lib/postfix/master
postfix   5036  0.0  0.3  2404  936 ?        S    09:29   0:00 pickup -l -t fifo -u -c
postfix   5037  0.0  0.3  2440  964 ?        S    09:29   0:00 qmgr -l -n qmgr -t fifo -u -c

In the preceding output, you can see that the Postfix master daemon is running as root, and the queue manager (qmgr) and pickup service are running as the postfix user, so the system is up and running.

There are several reasons why Postfix might fail to start, but the most common is that a Postfix daemon can't find a shared library. To approach this problem, first find the directories that Postfix uses with this command:

# postconf | grep directory
command_directory = /usr/sbin
config_directory = /etc/postfix
daemon_directory = /usr/lib/postfix
mail_spool_directory = /var/mail
manpage_directory = /usr/local/man
process_id_directory = pid
program_directory = /usr/sbin
queue_directory = /var/spool/postfix
readme_directory = no
require_home_directory = no
sample_directory = /etc/postfix
tls_random_exchange_name = ${config_directory}/prng_exch

You're looking for the path to daemon_directory and change to that directory. Look at the contents:

# cd /usr/lib/postfix
# ls -l
total 384
-rwxr-xr-x    1 root     root        16588 Sep 12 18:50 bounce
-rwxr-xr-x    1 root     root        22684 Sep 12 18:50 cleanup
-rwxr-xr-x    1 root     root         4248 Sep 12 18:50 error
-rwxr-xr-x    1 root     root        10344 Sep 12 18:50 flush
-rwxr-xr-x    1 root     root        20508 Sep 12 18:50 lmtp
-rwxr-xr-x    1 root     root        31956 Sep 12 18:50 local
-rwxr-xr-x    1 root     root        22388 Sep 12 18:50 master
-rwxr-xr-x    1 root     root        33084 Sep 12 18:50 nqmgr
-rwxr-xr-x    1 root     root         7248 Sep 12 18:50 pickup
-rwxr-xr-x    1 root     root        10496 Sep 12 18:50 pipe
-rwxr-xr-x    1 root     root        27424 Sep 12 18:50 qmgr
-rwxr-xr-x    1 root     root        12160 Sep 12 18:50 qmqpd
-rwxr-xr-x    1 root     root         7456 Sep 12 18:50 showq
-rwxr-xr-x    1 root     root        25000 Sep 12 18:50 smtp
-rwxr-xr-x    1 root     root        44712 Sep 12 18:50 smtpd
-rwxr-xr-x    1 root     root         5612 Sep 12 18:50 spawn
-rwxr-xr-x    1 root     root        10284 Sep 12 18:50 trivial-rewrite
-rwxr-xr-x    1 root     root        10400 Sep 12 18:50 virtual

The Postfix daemons from the command column in the /etc/postfix/master.cf file should be in this directory. You can inspect the shared library dependencies of a single program with the ldd command (this works on Linux, Solaris, and other common Unix variants; it may be a different command on other systems):

# ldd smtpd
        libpostfix-master.so.1 => /usr/lib/libpostfix-master.so.1 (0x4001d000)
        libpostfix-global.so.1 => /usr/lib/libpostfix-global.so.1 (0x40023000)
        libpostfix-dns.so.1 => /usr/lib/libpostfix-dns.so.1 (0x4003c000)
        libpostfix-util.so.1 => /usr/lib/libpostfix-util.so.1 (0x40040000)
        libdb3.so.3 => /usr/lib/libdb3.so.3 (0x4005d000)
        libnsl.so.1 => /lib/libnsl.so.1 (0x40105000)
        libresolv.so.2 => /lib/libresolv.so.2 (0x40119000)
        libgdbm.so.1 => /usr/lib/libgdbm.so.1 (0x4012a000)
        libc.so.6 => /lib/libc.so.6 (0x40130000)
        libdl.so.2 => /lib/libdl.so.2 (0x4024b000)
        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)

The preceding output seems to indicate that everything is in order with the smtpd daemon because every library dependency resolves to an actual file. However, you might be unlucky enough to get this instead:

# ldd smtpd
        libpostfix-master.so.1 => /usr/lib/libpostfix-master.so.1 (0x4001d000)
        libpostfix-global.so.1 => /usr/lib/libpostfix-global.so.1 (0x40023000)
        libpostfix-dns.so.1 => /usr/lib/libpostfix-dns.so.1 (0x4003c000)
        libpostfix-util.so.1 => /usr/lib/libpostfix-util.so.1 (0x40040000)
        libdb3.so.3 => not found
        libnsl.so.1 => /lib/libnsl.so.1 (0x4005d000)
        libresolv.so.2 => /lib/libresolv.so.2 (0x40071000)
        libgdbm.so.1 => /usr/lib/libgdbm.so.1 (0x40082000)
        libc.so.6 => /lib/libc.so.6 (0x40088000)
        libdl.so.2 => /lib/libdl.so.2 (0x401a3000)
        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)

In this case, libdb3.so.3 is missing. A program that cannot find all of its shared libraries will not run. If you're running Linux and install a Postfix package intended for another distribution (or even another version of your distribution), it's possible that you may discover this kind of problem only at run time. If this is the case, you need to make a decision.

The best approaches are to find a Postfix package that fits your distribution or to compile Postfix from source code (see Section XX.XX[XREF]). However, if you insist on trying to work with what you have, you can try to find libdb3.so.3 like this:

# find / -name libdb3.so.3
/usr/lib/libdb3.so.3

This command will probably take forever to finish (since it searched you whole filesytem), but if you're lucky enough to find the library, you can add its directory path to the /etc/ld.so.conf file and run ldconfig. Of course, this might invite library and symbol clashes. It's usually never a good idea to mess around with shared libraries unless you really know what you're doing.

The find command may not even help, because the library may not reside on your system. If this is the case, you might be able to find the package that contains the library. However, if you just can't seem to work it out, you need to make a tough decision. If finding a Postfix package that works seems out of the question, and compiling from source code seems daunting, you might consider switching operating systems or distributions.

2. Problems Establishing Connections

Postfix runs, but I can't connect to it.

If Postfix starts up fine but doesn't behave as expected, see if your server actually accepts connections on port 25. Connect to the SMTP port to find out. Here's how a successful connection plays out:

# telnet localhost 25
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
220 mail.example.com ESMTP Postfix
QUIT
221 Bye
Connection closed by foreign host.

You may be able to connect to the loopback interface, but this doesn't mean that the entire Internet can. Let's say that your machine is at 10.1.2.233. Try it again, this time connecting to that address:

# telnet 10.1.2.233 25
Trying 10.1.2.233...
Connected to 10.1.2.233.
Escape character is '^]'.
220 mail.example.com ESMTP Postfix
QUIT
221 Bye
Connection closed by foreign host.

If this doesn't work, the first thing to do is look in your main.cf file to see if inet_interfaces has been set but excludes the IP address that you're trying to reach. The default is to listen on all interfaces.

2.1. Checking the Network

If the Postfix configuration seems fine and Postfix has been restarted, but you still can't establish a connection, check the firewall or IP filtering configuration of your network. It's possible that perhaps your system blocks the port by default. There are a several places to look, because IP filtering and can happen through a operating system firewall script (for example, something that calls iptables or ipf) or outside of the machine, by firewall appliances and routers.

You have to check everywhere.

If your configuration seems correct so far, you need to check outside of your local network. Your ISP can block incoming traffic to your port 25 (and, incidentally, outgoing traffic to port 25 on another machine). If you find that your ISP is refusing incoming traffic and they refuse to open up the port, your only recourse is to change ISPs.

To see if an outsider can reach you, run this command:

# telnet relay-test.mail-abuse.org

When you make this connection, relay-test.mail-abuse.org performs an online relay test of the machine that made the connection. If your ISP (or your own firewall) doesn't block incoming connections to your box on port 25, then you should see quite a few messages in your log file.

If you can't connect to the host above, you may be having name resolution problems. Test it with this command:

# host relay-test.mail-abuse.org
relay-test.mail-abuse.org is an alias for cygnus.mail-abuse.org.
cygnus.mail-abuse.org has address 168.61.4.13

You should see an IP address, as shown in the preceding output. If you don't, you can't resolve hostnames. Your /etc/resolv.conf and/or /etc/nsswitch.conf files could be incorrect. It could be even worse; your machine might not even be able to connect to the Internet. Try to ping something. A successful test looks like this (use Control-C to stop the test):

# ping 134.169.9.107
PING 134.169.9.107 (134.169.9.107): 56 data bytes
64 bytes from 134.169.9.107: icmp_seq=0 ttl=54 time=12.1 ms
64 bytes from 134.169.9.107: icmp_seq=1 ttl=54 time=12.1 ms
64 bytes from 134.169.9.107: icmp_seq=2 ttl=54 time=12.1 ms

--- 134.169.9.107 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 12.1/12.1/12.1 ms

2.2. Verifying the Listening Process

If Postfix is running, your network checks out, but your test connections still don't seem to work, see if Postfix is actually listening on port 25 with the netstat command:

# netstat -t -a | grep LISTEN
tcp        0      0 *:printer               *:*                     LISTEN
tcp        0      0 localhost:domain        *:*                     LISTEN
tcp        0      0 *:ssh                   *:*                     LISTEN
tcp        0      0 *:smtp                  *:*                     LISTEN

The preceding output shows that there are servers listening on the printer, domain, SSH, and SMTP ports (check /etc/services for the numerical counterparts of the names). You can see that something is listening on port 25 (smtp). However, is this Postfix or something else? The lsof command can tell you. Try the command that follows. If the output includes sendmail listening on port 25, then your old sendmail binary is still active:

# lsof -i tcp:25
COMMAND    PID USER   FD   TYPE DEVICE SIZE NODE NAME
sendmail 25976 root    4u  IPv4 228618       TCP mail.example.com:smtp (LISTEN)

Kill this process and edit your system startup files so that it won't come back when you reboot (if at all possible, remove sendmail from your system entirely because you're supposed to run Postfix, remember?).

Note

lsof is an extremely powerful tool that can show all open files (and the processes using the files), but it is very dependent on your kernel. Make sure that your lsof is up to date and matches your current kernel. An outdated lsof returns no output for Internet connections at all. Run lsof -i if you're not sure if it works.

If you're using Postfix, the lsof output should include Postfix listening on port 25 with the master daemon:

# lsof -i tcp:25
COMMAND   PID USER   FD   TYPE DEVICE SIZE NODE NAME
master  26079 root   11u  IPv4 228828       TCP *:smtp (LISTEN)

3. Configuration Problems

My configuration is correct, but Postfix doesn't seem to use my settings.

The main.cf file is long and can be difficult to read. A configuration option could appear twice, or a typo could be hidden somewhere in a pile of comments. Use the postconf command to display the configuration that Postfix uses. You can see the difference just from line counts:

# cd /etc/postfix
# cat main.cf | wc -l
# postconf -n | wc -l

The output of postconf -n lists only the parameters that differ from the default main.cf settings. When changing main.cf, you should verify your changes with postconf to see if Postfix sees them.

You might prefer postconf -e configoption=setting to edit main.cf automatically. This little trick allows you to make changes to the Postfix configuration with shell scripts, cron jobs, or similiar.

If you're approaching a configuration issue, the postconf(1) manual page is worth reading.

4. Reporting Problems

How can I report my Postfix problems and make sure that I don't forget anything?

When first starting out, it can be difficult to judge the kind of information that is appropriate for reporting to . The postfinger program (by Simon J. Mudd) is a tool that extracts most of the relevant information. To see what it does, you can mail the postfinger output along with your own questions to yourself like this:

# postfinger | /usr/sbin/sendmail youraddress@your.domain

Of course, this assumes that outgoing mail works on your system. When all else fails, you can transfer the output to another system.

Note

postfinger has been part of the source distribution as of version 2.1. You can also get it at ftp://ftp.WL0.org/SOURCES/postfinger.

5. More About Logging

I don't seem to get enough information in the log.

If you're having trouble zeroing on problems with specific pieces of your Postfix installation, you can increase the amount of logging information on a per-daemon basis. Do this by appending -v to the daemon configuration entry in /etc/postfix/master.cf, as in this example for smtpd:

# ==========================================================================
# service type  private unpriv  chroot  wakeup  maxproc command + args
#               (yes)   (yes)   (yes)   (never) (50)
# ==========================================================================
smtp      inet  n       -       -       -       -       smtpd -v

To make the change take effect, reload Postfix. The daemon should now be very verbose when it does its work. If this still isn't enough information you can even add another -v to the entry and you'll get even more output. Make sure that you set it back to normal after you're done with debugging, because verbose logging generates lots of lines in your log file, hindering the overall system performance.

5.1. Client-Specific Logging

If you have a busy mail server and increasing the log level for all clients will bury you in output, you can also selectively increase logging for certain clients with the debug_peer_list parameter. This example shows how to make the smptd logging more verbose for only the clients at 10.0.0.1 and 10.0.0.4:

debug_peer_list = 10.0.0.1, 10.0.0.4

You can specify one or more hosts, domains, addresses, and networks as the value to this parameter. To make the change effective immediately, you need to run a postfix reload command.

5.2. Logging and qmgr

One common problem is that log output from the qmgr process. The queue manager should log like this:

Aug  5 17:05:26 hostname postfix/qmgr[308]: A44F828C71: from=<bamm@example.com>,\
    size=153136, nrcpt=1 (queue active)

If you're missing the log information, there are two possible causes:

libc problems

The libc implementation is broken (the syslog client does not reconnect when the syslogd server is restarted). If this is this case, you should upgrade your libc.

qmgr running chrooted

The postfix qmgr process is running chrooted (see master.cf), but there is no syslog socket inside the chroot jail. See the syslog(8) manual page for how to specify additional sockets and specify one for the Postfix chroot jail.

6. Other Configuration Errors

There are three errors that seem to happen all of the time:

Problems opening files

If you have a problem opening a file that seems to exist, see if it's specified as a map in the configuration file (for example, it starts with a hash: prefix). If this is the case, run postmap on the plain text file that contains the map data to create an indexed version.

Verify that the permissions and ownership are correct. Don't forget executable access on the directory and all directories leading up to it.

Permissions problems

If you have permissions problems, you can also see if Postfix can fix them automatically with the post-install command:

# /etc/postfix/post-install set-permissions upgrade-configuration

This command edits main.cf and master.cf as appropriate in addition to fixing permission problems, so you might want to make a backup of your configuration before doing this.

Comments

Any line that starts with a hash (#) as the first character of a line is a comment. Postfix doesn't accept any other comment syntax. If postconf shows a parameter that seems unfamiliar, you may have a misplaced # somewhere in your configuration file.

7. Intricacies of the chroot Jail

What is this chroot thing all about?

All too often, the ability to run chrooted causes strange problems. A default Postfix installation never runs chrooted by default. There are just too many things that can go wrong, so Wietse wisely chose not to chroot by default. Unfortunately, other package maintainers sometimes go a little crazy with security features.

Postfix only works properly in a chroot jail if all the files that it needs are inside the jail (the jail is the queue directory as specified by the queue_directory parameter, and is probably /var/spool/postfix). There are scripts that copy the files from the original locations in the filesystem into the jail that the package maintainer needs to provide. You typically need /etc/resolv.conf and /etc/nsswitch.conf. The Postfix source distribution includes a subdirectory examples/chroot-setup that contains scripts for setting up a chroot jail under different operating systems. Matthias Andree wrote a LINUX2 script that sets things straight on Linux.

In theory, a package maintainer should include the mechanism to build the chroot jail correctly on your particular operating system or distribution. However, if you're just starting out, you could be overwhelmed. Instead of correcting your chroot jail, which you never knew existed at all, you should probably "un-chroot" Postfix's daemons until you get Postfix fully operational. Edit /etc/postfix/master.cf and look at each entry:

# ==========================================================================
# service type  private unpriv  chroot  wakeup  maxproc command + args
#               (yes)   (yes)   (yes)   (never) (50)
# ==========================================================================
smtp      inet  n       -       -       -       -       smtpd

A - (hyphen) in the chroot column indicates that smtpd is running chrooted. Set this to n and reload Postfix. In additon, remember that not every Postfix daemon can run chrooted. Most of them can, but you're likely to encounter bizarre problems if you try to chroot the pipe, local or virtual daemons.

You'll find a lot more information about the chroot process in Chapter XX[XREF].

8. Filesystem Problems

Most modern Unix flavors offer journalling filesystems, but this may not protect you from occasional filesystem corruption, especially if you have bad hardware or are using a newfangled filesystem that hasn't been fully debugged. If there very strange things happening, such as directories turning into files, consider an immediate reboot with a full forced fsck of all disks with a series of commands like this:

# touch /forcefsck
# sync
# reboot

Yes, you won't have a pretty uptime number and users will complain, but you cannot fix filesystem problems without forcing an fsck. We've successfully annoyed many users on Red Hat and Debian this way.

9. Library Hell

Why can't Postfix find its libraries?

Postfix makes extensive use of shared libraries, such as the BerkeleyDB library. This particular library causes a lot of problems because there are so many different versions with different on-disk data formats of it. All mail service components, such as Postfix, pop-before-smtp, dracd, postgrey and other tools that access and alter hash: or btree: type maps need to use compatible BerkeleyDB libraries. It gets even worse; on-disk formats for different versions of BerkeleyDB are incompatible, meaning that an application may not be able to read a map written by another application that uses a different version of BerkeleyDB.

To check the libraries that Postfix uses, use the ldd command as described back in Section XX.XX[XREF]:

# ldd `postconf -h daemon_directory`/smtpd
        libpcre.so.0 => /usr/local/lib/libpcre.so.0 (0x4001d000)
        libdb-3.1.so => /lib/libdb-3.1.so (0x40028000)
        libnsl.so.1 => /lib/libnsl.so.1 (0x400a1000)
        libresolv.so.2 => /lib/libresolv.so.2 (0x400b8000)
        libc.so.6 => /lib/libc.so.6 (0x400ca000)
        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)

In the preceding output, smtpd was linked against BerkeleyDB-3.1.x, so all other programs that need to share the Postfix hash: and btree: maps must use the same version of BerkeleyDB (or at least a version that has the same on-disk format).

10. Daemon Inconsistencies

Why do I get strange errors in my log?

If an upgrade of Postfix fails or you do it in a non-standard way (such as installing from source over an RPM install instead of removing the old version first), strange things can happen. You may be mixing daemons from different versions of Postfix. To find out the version of Postfix that a daemon belongs to, use strings on all daemon binaries like this:

# strings /usr/libexec/postfix/smtpd | grep 2003
2.0.13-20030706
20030706
# strings /usr/libexec/postfix/cleanup | grep 2003
2.0.13-20030706
20030706

In this case, the versions (represented as dates) actually match. Please note that if you're using a version from 2004, you may want to grep for 2004 instead of 2003.

10.1. Fork Hell

I just updated and the load shot up.

One common problem that is caused by mixing daemons from incompatible Postfix versions has to do with the tlsmgr daemon. The load appears to be incredibly high, but nothing's running, and there isn't even much mail traffic or queued mail. However, process IDs are increasing constantly.

You probably upgraded Postfix but kept an old tlsmgr and master.cf file that runs tlsmgr.

The problem is that the new Postfix spawns the old tlsmgr, but this daemon immediately exits with status 0 because it can't work with the new version of Postfix. Postfix logs nothing because an exit code of 0 is normal. However, Postfix immediately respawns tlsmgr, and the process repeats itself.

If this turns out to be the case, first comment out the tlsmgr line in master.cf, then reload Postfix to resume rather normal services. Then you have time to get a working upgrade with a compatible version of tlsmgr.

11. Stress Testing

How much mail will my box be able to handle?

To find out how much traffic your installation can handle, you need to perform some kind of stress testing. To put an adequate load on the server, you need a fast mail generator. Postfix comes with a pair of testing programs named smtp-source and smtp-sink for just this purpose. Here's how they work:

smtp-source

This program connects to a host on a TCP port (port 25 by default) and sends one or more messages, either sequentially or in parallel. The program speaks both SMTP (default) or LMTP and is meant to aid in measuring server performance.

smtp-sink

This test server listens on the named host (or address) and port. It recieves messages from the network and throws them away. You can measure client and network performance with this program.

Let's start with smtp-source to stress-test your Postfix installation. The following example injects 100 total messages of size 5k each in 20 parallel sessions to a Postfix server running on localhost port 25. Because you're also interested in how much time this takes, use the time command:

$ time ./smtp-source -s 20 (1) -l 5120 (2) -m 100 (3) -c (4) \
  -f sender@example.com (5) -t recipient@example.com (6) localhost:25 (7)
100
real    0m4.294s
user    0m0.060s
sys     0m0.030s
(1)

20 parallel sessions

(2)

5k message size

(3)

100 total messages

(4)

display a counter

(5)

envelope sender

(6)

envelope recipient

(7)

target SMTP server

In the example above, injection took 4.294s. However, you also want to know how long actual delivery takes? Check your logs for this, and also to verify that every last message arrived for received.

Now let's turn our attention to smtp-sink to find out how many messages per second your server can handle from your horrible mass mailing sofware. Postfix has to process each outgoing message even if the server on the other side throws it away (therefore, you can't use this to test the raw performance of your mass mailer unless you connect your mailer directly to smtp-sink).

The following example sets up an SMTP listener on port 25 of localhost:

$ ./smtp-sink -c localhost:25 1000

Now you can run your client tests.

If you want to get an idea for how much overhead the network imposes and also get a control experiment to see what the theoretical maximum throughput for a mail server, you can make smtp-source and smtp-sink talk to each other. Open two windows. In the first, start up the dummy server like this:

# ./smtp-sink -c localhost:25 1000
100

With this in place, start throwing messages at this server with smtp-source in the other window:

$ time ./smtp-source -s 20 -l 5120 -m 100 -c \
  -f sender@example.com -t recipient@example.com localhost:25
100

real    0m0.239s
user    0m0.000s
sys     0m0.040s

This output shows that smtp-sink is much faster at accepting messages than Postfix. It took only 0.239 seconds to accept the messages, which is 18 times faster than the Postfix injection process. Now, wouldn't it be nice if you could throw away all incoming email like this?

11.1. Disk I/O

Why do I see huge load, when no process is actually using the processor during stress testing?

When you run your stress testing, you might encounter huge load averages on your machine that seem out of place. Assuming that you don't have any content filtering in place, Postfix is I/O bound, so your I/O subsystem could be saturated.

If the output of top shows a a high load such as 10.7, but none of your processes are actually using the CPU. In this particular case, your load is probably coming from the kernel using most of the CPU for I/O and not letting processes run. Furthermore, the reason that the kernel is doing so much I/O is that many more processes have requested I/O operations (and are now waiting for them).

Linux 2.6 kernels support iowait status in the top command. To see if this is the case on 2.4.x kernels (which don't have a seperate means of displaying the iowait status), you can add a kernel module. Oliver Wellnitz wrote such a kernel module that you can download at ftp://ftp.ibr.cs.tu-bs.de/os/linux/people/wellnitz/programming/. This module calculates the load differently and gives you an interface in the /proc filesystem that you can see like this:

# cat /proc/loadavg-io
rq 0.30 0.23 0.14
io 0.08 0.31 0.27

In this example, rq is the number of processes, which are in the state TASK_RUNNING, while io is the number of processes, which are in the state TASK_UNINTERRUPTIBLE (waiting for I/O). The sum of those two is what the kernel usually calls load.

If you're having problems like this, you need faster disks, or even a solution such as a SSD (a solid state disk, basically a battery backupped RAM disk) or a mirrored/striped RAID for the queue directory. See Section XX.XX[XREF, in the performance chapter] for more information. One other solution that may or may not work is to remove the synchronous updates for the queue directory. If you're using an ext2 or ext3 filesystem, try this command:

# chattr -R -S /var/spool/postfix/

This setting is actually the default with recent Postfix installations.

11.2. Too Many Connections

Why does Postfix put such a heavy load on my LDAP or SQL server?

When you set up your mail server, you may try to tackle too many problems at once. If you want a stable Postfix system, change one thing at a time. This especially holds true if you want to use LDAP or SQL. Try proceeding like this:

  1. Build your system without LDAP maps (that is, use hash, btree or dbm maps)

  2. Use appropriate ldapsearch commands to extract all the necessary data from your LDAP server. Use a scripting language such as Perl or Python to reformat it into the Postfix map file input.

  3. When your Postfix is working correctly without LDAP, replace one map at a time with a corresponding LDAP map. Test each LDAP map as user postfix like this:

    $ postmap -q - ldap:mapname < keyfile

    keyfile contains a list of addresses (keys) to query. If the map returns sensible data, change a suitable _maps configuration parameter to have Postfix use the LDAP map.

  4. To consolidate the number of open lookup tables, share one open table among multiple Postfix processes with the proxymap daemon as described in Section XX.XX[XREF].