Project

General

Profile

Actions

Bug #1911

closed

segfault with lighttpd 1.4.20 + scgi

Added by kevinsl about 15 years ago. Updated about 15 years ago.

Status:
Fixed
Priority:
Normal
Category:
core
Target version:
ASK QUESTIONS IN Forums:

Description

Hello, I'm using lighttpd 1.4.20 on CentOS 4.4 (x86_64). I have a Python+scgi app running on localhost, port 4000 and I have lighttpd connecting to that. This works fine in my own testing but when I send real traffic to the server lighttpd crashes after about 200 requests. I get this message in /var/log/messages:

kernel: lighttpd[19549]: segfault at 00000000005b0000 rip 00000036455725b0 rsp 0000007fbffff498 error 4

Below is the configuration I'm using:

server.modules = ("mod_compress", "mod_status", "mod_rewrite", "mod_access", "mod_cgi", "mod_accesslog", "mod_setenv", "mod_scgi")
server.event-handler = "linux-sysepoll" 
server.document-root = "/var/www/mysite/static" 
server.port = 8000

$SERVER["socket"] == "10.10.10.34:81" {
 server.document-root = "/var/www/mysite" 
 scgi.server = ( "/" =>
   ( "127.0.0.1" =>
     ( "host" => "127.0.0.1", "port" => 4000, "check-local" => "disable")
   )
 )
 server.tag = "lighttpd" 
 accesslog.format = "%{X-Cluster-Client-Ip}i %l %u %t %{Host}i \"%r\" %s %b \"%{Referer}i\" \"%{User-Agent}i\"" 
 accesslog.filename = "|/usr/sbin/cronolog /var/log/mysite/access_log.%Y%m%d" 
}

mimetype.assign = (
  ".html" => "text/html",
  ".txt" => "text/plain",
  ".jpg" => "image/jpeg",
  ".png" => "image/png",
  ".gif" => "image/gif",
  ".js" => "application/x-javascript",
  ".css" => "text/css",
  ".xsl" => "text/plain",
  ".ico" => "image/x-icon",
  ".src" => "text/plain",
  ".htc" => "text/x-component",
  ".mp3" => "audio/mpeg" 
)

compress.cache-dir = "/var/www/cache/" 
compress.filetype = ("text/plain", "text/html", "text/javascript")

Files

lighttpd.17720.txt (22.2 KB) lighttpd.17720.txt kevinsl, 2009-02-23 19:17
fix-segfault-in-mod-scgi.patch (2.02 KB) fix-segfault-in-mod-scgi.patch stbuehler, 2009-02-23 22:37
lighttpd.28533.txt (8.51 KB) lighttpd.28533.txt kevinsl, 2009-02-26 20:19
Actions #1

Updated by kevinsl about 15 years ago

I've tried to reproduce the problem with apache benchmark (ab) but am not able to reproduce it. Also, I don't think this is a problem with my python/scgi app because it stays running and produces no errors.

Actions #2

Updated by stbuehler about 15 years ago

  • Target version changed from 1.4.20 to 1.4.22
Actions #3

Updated by stbuehler about 15 years ago

  • Status changed from New to Need Feedback

I don't think we can help you without a backtrace (or a way to reproduce it).

Actions #4

Updated by kevinsl about 15 years ago

Ok, here is a traceback produced by valgrind.

Actions #5

Updated by stbuehler about 15 years ago

  1. It would be nice if you could try the attached patch (i have no scgi application to test it... and "proper" applications shouldn't trigger that bug anyway).
  2. I guess you mixed the line endings "\n" and "\r\n" in the response header - you really should always use "\r\n", or at least always the same.
Actions #6

Updated by kevinsl about 15 years ago

I wasn't able to try your patch since I'm using a binary distribution. But I checked my code and found a few places where headers had \n and \r\n mixed together. I corrected that and now the application runs better.

But there is a new problem. Now I get several of these messages in lighttpd's error_log:

2009-02-26 10:13:48: (mod_scgi.c.2467) emergency exit: scgi: connection-fd: 12 fcgi-fd: 9 
2009-02-26 10:14:30: (mod_scgi.c.2467) emergency exit: scgi: connection-fd: 12 fcgi-fd: 10 
2009-02-26 10:14:42: (mod_scgi.c.2467) emergency exit: scgi: connection-fd: 16 fcgi-fd: 17 
2009-02-26 10:16:11: (mod_scgi.c.1790) Connection reset by peer 11 9 
2009-02-26 10:16:11: (mod_scgi.c.2575) response already sent out, termination connection connection-fd: 11 fcgi-fd: 9 
2009-02-26 10:16:49: (mod_scgi.c.1790) Connection reset by peer 14 12 
2009-02-26 10:16:49: (mod_scgi.c.2575) response already sent out, termination connection connection-fd: 14 fcgi-fd: 12 
2009-02-26 10:16:50: (mod_scgi.c.2467) emergency exit: scgi: connection-fd: 11 fcgi-fd: 9 
2009-02-26 10:17:14: (mod_scgi.c.1790) Connection reset by peer 12 9 
2009-02-26 10:17:14: (mod_scgi.c.2575) response already sent out, termination connection connection-fd: 12 fcgi-fd: 9 
2009-02-26 10:20:26: (mod_scgi.c.1790) Connection reset by peer 9 10 
2009-02-26 10:20:26: (mod_scgi.c.2575) response already sent out, termination connection connection-fd: 9 fcgi-fd: 10 
2009-02-26 10:21:15: (mod_scgi.c.2467) emergency exit: scgi: connection-fd: 11 fcgi-fd: 13 
2009-02-26 10:22:35: (mod_scgi.c.1790) Connection reset by peer 8 9 
2009-02-26 10:22:35: (mod_scgi.c.2575) response already sent out, termination connection connection-fd: 8 fcgi-fd: 9 
2009-02-26 10:22:53: (mod_scgi.c.1790) Connection reset by peer 11 8 
2009-02-26 10:22:53: (mod_scgi.c.2575) response already sent out, termination connection connection-fd: 11 fcgi-fd: 8 

While getting these errors my application will run for about one hour and then lighttd stops accepting new connections but the daemon is still running. My scgi app seems fine.

I'm attaching a traceback I captured while these errors were logged.

Any ideas what the problem is?

Actions #7

Updated by stbuehler about 15 years ago

I don't know how that ECONNRESET is triggered for read(), perhaps a strace could help us there. The "emergency exit" is probably triggered when the client aborts the request.

But that should go into a new bug, this one was for the segfault :)

Actions #8

Updated by stbuehler about 15 years ago

  • Status changed from Need Feedback to Fixed
  • % Done changed from 0 to 100

Applied in changeset r2404.

Actions

Also available in: Atom