Bug #2711: Cronolog Broken pipe - Lighttpd - lighty labs

Actions

Copy link

Bug #2711

closed

Cronolog Broken pipe

Added by atmnxw about 8 years ago. Updated about 8 years ago.

Status:

Fixed

Priority:

Normal

Category:

Target version:

1.4.40

ASK QUESTIONS IN Forums:

Description

Hi,

Since we updated lighttpd to version 1.4.37 we're having some problems with the error logs.

Each time we reload the server we have this entries in the logs, (around 150 entries by second):

2016-01-27 11:34:04: (mod_accesslog.c.165) writing access log entry failed: |/opt/cronolog/sbin/cronolog /srv/htlogs/lighttpd/lighttpd14_access.%d.%H00 Broken pipe

The same configuration was working fine with version 1.4.36 and before, we're running Centos 6.7, the version of lighttpd is 1.4.39 and the configuration of the server is:

accesslog.filename = "|/opt/cronolog/sbin/cronolog /srv/htlogs/lighttpd/lighttpd14_access.%d.%H00"

Thanks!

Actions

Copy link

Updated by gstrauss about 8 years ago

I am curious if the problem persists or goes away if you use /bin/nohup in the piped log command:

accesslog.filename = "|/bin/nohup /opt/cronolog/sbin/cronolog /srv/htlogs/lighttpd/lighttpd14_access.%d.%H00"

Send the running server a couple SIGHUP signals and then check the logs. Would you test and report back?

Actions

Copy link

Updated by stbuehler about 8 years ago

Target version set to 1.4.x

Actions

Copy link

Updated by atmnxw about 8 years ago

The nohup worked

Thanks a lot!

Actions

Copy link

Updated by gstrauss about 8 years ago

Excellent! Glad that works for you.

So that I might suggest a long-term solution, which modules are you using that require you to send SIGHUP to lighttpd?

Some explanation:
When lighttpd receives a SIGHUP signal, it propagates the signal to all children in its process group (including piped loggers). I think this done so that lighttpd workers receive SIGHUP. Receipt of SIGHUP triggers module hup handlers, but lighttpd does not restart piped loggers, so those they might die and not be restarted.

@stbuehler:

The propagation of SIGHUP to the entire process group is not very graceful, as it might kill piped loggers, any CGI in progress, or any of a slew of other subprocesses. Is the propagation of SIGHUP done only so that lighttpd workers receive SIGHUP, or are there other reasons? If no other reason, then we might reduce the impact by not propagating SIGHUP unless multiple workers are configured. A more proper solution might be to keep track of lighttpd worker pids and to propagate SIGHUP only to those pids.

Separately, is there a reason that piped loggers are not restarted upon receipt of SIGHUP? In a hup handler, mod_accesslog could start a new piped logger, and if successful, close the write end of the pipe to the old piped logger.

Actions

Copy link

Updated by stbuehler about 8 years ago

gstrauss wrote:

Some explanation:
When lighttpd receives a SIGHUP signal, it propagates the signal to all children in its process group (including piped loggers). I think this done so that lighttpd workers receive SIGHUP. Receipt of SIGHUP triggers module hup handlers, but lighttpd does not restart piped loggers, so those they might die and not be restarted.

I'm just not sure what changed; the server.max-workers options is officially not supported (which would be the "forward SIGHUP to ourself" case), and otherwise it should not get forwarded.

Maybe CentOS switched to systemd in that upgrade? That might explain a different process environment. (I think the standard "daemonize" has the session leader exit, while in systemd we use the non-daemonize start, which probably makes lighttpd the session leader. If then the reload sends SIGHUP to the "negative" session leader all processes in the group will get the signal).

@stbuehler:

The propagation of SIGHUP to the entire process group is not very graceful, as it might kill piped loggers, any CGI in progress, or any of a slew of other subprocesses. Is the propagation of SIGHUP done only so that lighttpd workers receive SIGHUP, or are there other reasons? If no other reason, then we might reduce the impact by not propagating SIGHUP unless multiple workers are configured. A more proper solution might be to keep track of lighttpd worker pids and to propagate SIGHUP only to those pids.

A proper max-worker solution would do a lot of things differently. I don't think it's worth our time.

Separately, is there a reason that piped loggers are not restarted upon receipt of SIGHUP? In a hup handler, mod_accesslog could start a new piped logger, and if successful, close the write end of the pipe to the old piped logger.

It probably would be nice to have a proper SIGCHLD handling (it would need to be integrated into the fdevent loop handling probably, as libev for example already has a SIGCHLD handler. Then the pipe loggers would need to get restarted when they exit (and we would also need a way to limit restart to detect permanent failures and so on...).

I see no reason not to restart pipe loggers on SIGHUP though, although I think we probably should first let the old logger exit before starting a new one to avoid race conditions when writing to the same file.

Actions

Copy link

Updated by gstrauss about 8 years ago

Since mod_accesslog is currently the only standard module hooking handle_sighup in lighttpd 1.4, it is likely that those using piped-loggers do not bother sending SIGHUP to lighttpd. (Unlike other servers, lighttpd does not reload its configuration when it receives SIGHUP.)

I created https://github.com/lighttpd/lighttpd1.4/pull/17 with a simple change to avoid forwarding SIGHUP if server.max-workers is not configured. I also noted: For those configuring server.max-workers, it is recommended that piped loggers be used to avoid log corruption, and then admins can avoid sending lighttpd SIGHUP as there is currently no benefit to doing so with the standard modules (beyond that of log rotation of non-piped access and error logs).

(I also agree that introducing pid-management for workers and piped loggers is a larger project, with a larger scope than this problem report.)