lighttpd + lighttpd-angel process "gone"
Added by robymus over 16 years ago
This happened two weeks ago first, the lighttpd and lighttpd-angel processes (both) disappeared - and nothing is shown in the log, access.log shows normal operation in the last 10 minutes before the crash. I just ignored it that time (strange things happen sometimes), but it happened again yesterday. I was using version 1.4.19, I upgraded to 1.4.20 now, after the second crash.
The really strange thing is the angel process disappearing and not restarting lighttpd, as expected from a nice guardian angel :(
I was using lighttpd 1.4.19 at the time of the incidents (i will post my observations with 1.4.20 later, as this happens very rarely).
OS: Debian etch, kernel 2.6.18 + openvz patch, lighttpd running in a virtual server (vz), with plenty of resources.
This is a quite busy webserver with around 9.5 million requests/day.
Server is using selfsigned certificate for https, https load is very small (not properly measured, but less then 0.1%)
In error.log there is nothing immediatelly before the incidents, but in both cases in the 2 hours before the incident, messages like these occur in increased frequency, but not huge amount (the pasted errors are all i got before the crash, which happened at around 02:10
2008-12-08 00:19:34: (connections.c.279) SSL: 1 error:14094418:SSL routines:SSL3_READ_BYTES:tlsv1 alert unknown ca 2008-12-08 00:19:34: (connections.c.279) SSL: 1 error:140940E5:SSL routines:SSL3_READ_BYTES:ssl handshake failure 2008-12-08 00:20:25: (connections.c.279) SSL: 1 error:14094418:SSL routines:SSL3_READ_BYTES:tlsv1 alert unknown ca 2008-12-08 00:20:25: (connections.c.279) SSL: 1 error:140940E5:SSL routines:SSL3_READ_BYTES:ssl handshake failure 2008-12-08 00:20:25: (connections.c.279) SSL: 1 error:140940E5:SSL routines:SSL3_READ_BYTES:ssl handshake failure 2008-12-08 00:20:28: (connections.c.279) SSL: 1 error:14094416:SSL routines:SSL3_READ_BYTES:sslv3 alert certificate unknown 2008-12-08 00:20:28: (connections.c.279) SSL: 1 error:140940E5:SSL routines:SSL3_READ_BYTES:ssl handshake failure 2008-12-08 00:43:29: (connections.c.279) SSL: 1 error:14094418:SSL routines:SSL3_READ_BYTES:tlsv1 alert unknown ca 2008-12-08 00:43:29: (connections.c.279) SSL: 1 error:140940E5:SSL routines:SSL3_READ_BYTES:ssl handshake failure 2008-12-08 00:43:31: (connections.c.279) SSL: 1 error:14094418:SSL routines:SSL3_READ_BYTES:tlsv1 alert unknown ca 2008-12-08 00:43:31: (connections.c.279) SSL: 1 error:140940E5:SSL routines:SSL3_READ_BYTES:ssl handshake failure 2008-12-08 00:43:32: (connections.c.279) SSL: 1 error:14094418:SSL routines:SSL3_READ_BYTES:tlsv1 alert unknown ca 2008-12-08 00:43:32: (connections.c.279) SSL: 1 error:140940E5:SSL routines:SSL3_READ_BYTES:ssl handshake failure 2008-12-08 00:43:45: (connections.c.279) SSL: 1 error:14094418:SSL routines:SSL3_READ_BYTES:tlsv1 alert unknown ca 2008-12-08 00:43:45: (connections.c.279) SSL: 1 error:140940E5:SSL routines:SSL3_READ_BYTES:ssl handshake failure
These messages appear in the error.log every once in a while, around 4-5 times a day, so it is possible that this is not related to this crash at all. Note, the increased frequency mentioned means all the messages above.
Config file is as follows:
server.modules = ( "mod_rewrite", "mod_access", "mod_auth", "mod_status", "mod_fastcgi", "mod_proxy", "mod_secdownload", "mod_accesslog" ) server.document-root = "/var/www" server.errorlog = "/var/log/lighttpd/error.log" index-file.names = ( "index.php", "index.html", "index.htm" ) server.event-handler = "linux-sysepoll" server.max-fds = 2048 mimetype.assign = ( # .. standard mime types here .. ) accesslog.filename = "/var/log/lighttpd/access.log" url.access-deny = ( "~", ".inc" ) $HTTP["url"] =~ "\.pdf$" { server.range-requests = "disable" } static-file.exclude-extensions = ( ".php" ) server.port = 80 server.bind = "217.20.x.x" $SERVER["socket"] == "127.0.0.1:80" {} $SERVER["socket"] == "10.10.9.2:80" {} server.error-handler-404 = "/404.html" server.pid-file = "/var/run/lighttpd.pid" server.username = "www-data" server.groupname = "www-data" dir-listing.activate = "enable" fastcgi.server = ( ".php" => ( ( "host" => "127.0.0.1", "port" => 8002, "broken-scriptfilename" => "enable" ) ) ) url.rewrite-once = ( "^/x/(.*)" => "/x.php/$1", # .. some more rewrite rules, stripped now ) $HTTP["url"] =~ "^/ref" { proxy.server = ( "" => ( ( "host" => "127.0.0.1", "port" => 8080 ) ) ) } $SERVER["socket"] == "217.20.139.10:443" { ssl.engine = "enable" ssl.pemfile = "/opt/lighttpd/etc/server.pem" server.document-root = "/var/www" status.status-url = "/server/status" status.config-url = "/server/config" auth.backend = "plain" auth.backend.plain.userfile = "/opt/lighttpd/etc/authusers" auth.require = ( "/server" => ( "method" => "digest", "realm" => "Restricted area", "require" => "valid-user" ) ) }
If you have any idea, or this is related to any open issues, please let me know.
If there are some settings to increase error logging verbosity, so it can help analyze the cause of the issue, could you please advise me on the configuration?
I have checked known bugs in 1.4.20 and 1.4.20 changelog, but I didn't match any issues without doubt.
Bug #1782 (Lighty should not crash if it could not write a log file) looked suspicious, i was thinking about log file writing failure during log rotation, but the crashes happened at least 30 minutes after log rotation.
Replies (3)
RE: lighttpd + lighttpd-angel process "gone" - Added by robymus over 16 years ago
It is possible that the cause was
- lighttpd hit "number of open files" limit on OS
- lighttpd hit max-fd
(I have just tuned these settings, as these were not updated to match traffic growth)
But still not a good-enough reason to stop the angel process.
RE: lighttpd + lighttpd-angel process "gone" - Added by manik over 16 years ago
its probably a crash, don't have the core dump ?
- lighttpd hit "number of open files" limit on OS
- lighttpd hit max-fd
In either of these cases, lighty doesn't die but just stops accepting any new connection requests. also an error message to that effect is logged.
RE: lighttpd + lighttpd-angel process "gone" - Added by robymus over 16 years ago
Sorry for not being clear, not real crash - no core dump. Just the processes are gone.
Looks like I have found the reason logging stdout/stderr this time (this was not logged before).
2008-12-09 01:40:32: (network.c.300) can't bind to port: 217.20.x.x 80 Address already in use lighttpd-angel.c.140: child (pid=31829) exited normally with exitcode: 255
And this time at the time of log rotation.
So here it is what happens - after logrotate I try do a graceful restart with the angel process (SIGHUP).
From angel source:
/** do a graceful shutdown on the main process and start a new child */ kill(pid, SIGINT); usleep(5 * 1000); /** wait 5 microsec */ start_process = 1;
As it seems it gives 5 microseconds for graceful shutdown and then restarts process. I suspect this delay is not enough for shutdown (maybe due to system load, doing logrotation/compressing), so the old process is still keeping the socket open. The new process can not start because of this, but the angel process interprets it as normal exit and does not try to start lighty again.
However, from this it seems that the problem is totally with logrotation (and due to something strange the process termination happened later in the first two cases, but today at the same time).
It was my fault totally (probably i skipped something in the documentation) - after logrotation I have sent SIGHUP to the angel process, as I was thinking of the angel as some kind of silver bullet good for everything - but as it seems it is a restart and not a reload operation, as I have thought. Now I have changed the logrotation script to signal SIGHUP to the main lighty process to initiate a log cycling.
This must fix it.
Sorry guys for bothering you with the long report - narrowing down to this simple explanation. I was really scared :)