Bug #1911

segfault with lighttpd 1.4.20 + scgi

Added by kevinsl almost 6 years ago. Updated over 5 years ago.

Status:FixedStart date:2009-02-20
Priority:NormalDue date:
Assignee:-% Done:

100%

Category:core
Target version:1.4.22
Missing in 1.5.x:

Description

Hello, I'm using lighttpd 1.4.20 on CentOS 4.4 (x86_64). I have a Python+scgi app running on localhost, port 4000 and I have lighttpd connecting to that. This works fine in my own testing but when I send real traffic to the server lighttpd crashes after about 200 requests. I get this message in /var/log/messages:

kernel: lighttpd[19549]: segfault at 00000000005b0000 rip 00000036455725b0 rsp 0000007fbffff498 error 4

Below is the configuration I'm using:

server.modules = ("mod_compress", "mod_status", "mod_rewrite", "mod_access", "mod_cgi", "mod_accesslog", "mod_setenv", "mod_scgi")
server.event-handler = "linux-sysepoll" 
server.document-root = "/var/www/mysite/static" 
server.port = 8000

$SERVER["socket"] == "10.10.10.34:81" {
 server.document-root = "/var/www/mysite" 
 scgi.server = ( "/" =>
   ( "127.0.0.1" =>
     ( "host" => "127.0.0.1", "port" => 4000, "check-local" => "disable")
   )
 )
 server.tag = "lighttpd" 
 accesslog.format = "%{X-Cluster-Client-Ip}i %l %u %t %{Host}i \"%r\" %s %b \"%{Referer}i\" \"%{User-Agent}i\"" 
 accesslog.filename = "|/usr/sbin/cronolog /var/log/mysite/access_log.%Y%m%d" 
}

mimetype.assign = (
  ".html" => "text/html",
  ".txt" => "text/plain",
  ".jpg" => "image/jpeg",
  ".png" => "image/png",
  ".gif" => "image/gif",
  ".js" => "application/x-javascript",
  ".css" => "text/css",
  ".xsl" => "text/plain",
  ".ico" => "image/x-icon",
  ".src" => "text/plain",
  ".htc" => "text/x-component",
  ".mp3" => "audio/mpeg" 
)

compress.cache-dir = "/var/www/cache/" 
compress.filetype = ("text/plain", "text/html", "text/javascript")

lighttpd.17720.txt Magnifier (22.2 KB) kevinsl, 2009-02-23 19:17

fix-segfault-in-mod-scgi.patch Magnifier (2.02 KB) stbuehler, 2009-02-23 22:37

lighttpd.28533.txt Magnifier (8.51 KB) kevinsl, 2009-02-26 20:19

Associated revisions

Revision 2404
Added by stbuehler over 5 years ago

Fix segfault in mod_scgi (fixes #1911)

History

#1 Updated by kevinsl almost 6 years ago

I've tried to reproduce the problem with apache benchmark (ab) but am not able to reproduce it. Also, I don't think this is a problem with my python/scgi app because it stays running and produces no errors.

#2 Updated by stbuehler almost 6 years ago

  • Target version changed from 1.4.20 to 1.4.22

#3 Updated by stbuehler almost 6 years ago

  • Status changed from New to Need Feedback

I don't think we can help you without a backtrace (or a way to reproduce it).

#4 Updated by kevinsl over 5 years ago

Ok, here is a traceback produced by valgrind.

#5 Updated by stbuehler over 5 years ago

  1. It would be nice if you could try the attached patch (i have no scgi application to test it... and "proper" applications shouldn't trigger that bug anyway).
  2. I guess you mixed the line endings "\n" and "\r\n" in the response header - you really should always use "\r\n", or at least always the same.

#6 Updated by kevinsl over 5 years ago

I wasn't able to try your patch since I'm using a binary distribution. But I checked my code and found a few places where headers had \n and \r\n mixed together. I corrected that and now the application runs better.

But there is a new problem. Now I get several of these messages in lighttpd's error_log:

2009-02-26 10:13:48: (mod_scgi.c.2467) emergency exit: scgi: connection-fd: 12 fcgi-fd: 9 
2009-02-26 10:14:30: (mod_scgi.c.2467) emergency exit: scgi: connection-fd: 12 fcgi-fd: 10 
2009-02-26 10:14:42: (mod_scgi.c.2467) emergency exit: scgi: connection-fd: 16 fcgi-fd: 17 
2009-02-26 10:16:11: (mod_scgi.c.1790) Connection reset by peer 11 9 
2009-02-26 10:16:11: (mod_scgi.c.2575) response already sent out, termination connection connection-fd: 11 fcgi-fd: 9 
2009-02-26 10:16:49: (mod_scgi.c.1790) Connection reset by peer 14 12 
2009-02-26 10:16:49: (mod_scgi.c.2575) response already sent out, termination connection connection-fd: 14 fcgi-fd: 12 
2009-02-26 10:16:50: (mod_scgi.c.2467) emergency exit: scgi: connection-fd: 11 fcgi-fd: 9 
2009-02-26 10:17:14: (mod_scgi.c.1790) Connection reset by peer 12 9 
2009-02-26 10:17:14: (mod_scgi.c.2575) response already sent out, termination connection connection-fd: 12 fcgi-fd: 9 
2009-02-26 10:20:26: (mod_scgi.c.1790) Connection reset by peer 9 10 
2009-02-26 10:20:26: (mod_scgi.c.2575) response already sent out, termination connection connection-fd: 9 fcgi-fd: 10 
2009-02-26 10:21:15: (mod_scgi.c.2467) emergency exit: scgi: connection-fd: 11 fcgi-fd: 13 
2009-02-26 10:22:35: (mod_scgi.c.1790) Connection reset by peer 8 9 
2009-02-26 10:22:35: (mod_scgi.c.2575) response already sent out, termination connection connection-fd: 8 fcgi-fd: 9 
2009-02-26 10:22:53: (mod_scgi.c.1790) Connection reset by peer 11 8 
2009-02-26 10:22:53: (mod_scgi.c.2575) response already sent out, termination connection connection-fd: 11 fcgi-fd: 8 

While getting these errors my application will run for about one hour and then lighttd stops accepting new connections but the daemon is still running. My scgi app seems fine.

I'm attaching a traceback I captured while these errors were logged.

Any ideas what the problem is?

#7 Updated by stbuehler over 5 years ago

I don't know how that ECONNRESET is triggered for read(), perhaps a strace could help us there. The "emergency exit" is probably triggered when the client aborts the request.

But that should go into a new bug, this one was for the segfault :)

#8 Updated by stbuehler over 5 years ago

  • Status changed from Need Feedback to Fixed
  • % Done changed from 0 to 100

Applied in changeset r2404.

Also available in: Atom