Project

General

Profile

Bug #760

Random crashing on FreeBSD 6.1

Added by Anonymous over 10 years ago. Updated 9 months ago.

Status:
Fixed
Priority:
Urgent
Assignee:
-
Category:
core
Target version:
Start date:
Due date:
% Done:

100%

Missing in 1.5.x:

Description

Here is the backtrace, lighttpd crashes randomly about 30-40 times a day on a fairly heavy traffic website that serves 30-60mb files.

lighttpd.trace - lighttpd.trace -- Wilik (1.29 KB) Anonymous, 2006-07-23 04:36

lighttpd.strace - strace file -- Wilik (22.1 KB) Anonymous, 2006-07-23 16:57

lighttpd.conf View - -- geoff (1.74 KB) Anonymous, 2007-05-23 00:09

bug760.patch View - Patch to work around problem of crashing on large files -- geoff (4.21 KB) Anonymous, 2007-06-26 22:39


Related issues

Related to Bug #949: fastcgi, cgi, flush, php5 problem. Fixed

Associated revisions

Revision 5a91fd4b (diff)
Added by gstrauss 10 months ago

[core] buffer large responses to tempfiles (fixes #758, fixes #760, fixes #933, fixes #1387, #1283, fixes #2083)

This replaces buffering entire response in memory which might lead to
huge memory footprint and possibly to memory exhaustion.

use tempfiles of fixed size so disk space is freed as each file sent

update callers of http_chunk_append_mem() and http_chunk_append_buffer()
to handle failures when writing to tempfile.

x-ref:
"memory fragmentation leads to high memory usage after peaks"
https://redmine.lighttpd.net/issues/758
"Random crashing on FreeBSD 6.1"
https://redmine.lighttpd.net/issues/760
"lighty should buffer responses (after it grows above certain size) on disk"
https://redmine.lighttpd.net/issues/933
"Memory usage increases when proxy+ssl+large file"
https://redmine.lighttpd.net/issues/1283
"lighttpd+fastcgi memory problem"
https://redmine.lighttpd.net/issues/1387
"Excessive Memory usage with streamed files from PHP"
https://redmine.lighttpd.net/issues/2083

Revision 18a7b2be (diff)
Added by gstrauss 9 months ago

[core] option to stream response body to client (fixes #949, #760, #1283, #1387)

Set server.stream-response-body = 1 or server.stream-response-body = 2
to have lighttpd stream response body to client as it arrives from the
backend (CGI, FastCGI, SCGI, proxy).

default: buffer entire response body before sending response to client.
(This preserves existing behavior for now, but may in the future be
changed to stream response to client, which is the behavior more
commonly expected.)

x-ref:
"fastcgi, cgi, flush, php5 problem."
https://redmine.lighttpd.net/issues/949
"Random crashing on FreeBSD 6.1"
https://redmine.lighttpd.net/issues/760
"Memory usage increases when proxy+ssl+large file"
https://redmine.lighttpd.net/issues/1283
"lighttpd+fastcgi memory problem"
https://redmine.lighttpd.net/issues/1387

History

#1 Updated by wiak over 10 years ago

i have the same problem :/
LightTPD crashes on heavy trafficc random

#2 Updated by Anonymous over 10 years ago

This has been a problem for me on FreeBSD 6.x since I first started using Lighty on v1.4.10. It's running under supervise now, so it restarts immediately, but it's still annoying and not very impressive.

-- weird_ed

#3 Updated by about 10 years ago

Upgrade to 1.4.13 and see if it still happens.

Also, what ulimits are you running lighttpd under?

#4 Updated by about 10 years ago

Also, paste your lighttpd config.

#5 Updated by Anonymous almost 10 years ago

I'm lucky to be able to reproduce the bug at will, so once I found this report it was easy to confirm that I'm having the same problem. Even better, it was also trivial to identify the proximate cause.

The problem is a malloc failure in buffer_prepare_copy. That, in turn, is caused by a massive memory leak. Lighty's process size when it died was 3013792, or over 3 GB. Not coincidentally, the file I was downloading is 3.8 GB in size. Clearly, lighty is either trying to cache the entire file internally, or failing to free buffers as the copy progresses.

A test with a smaller file (0.5 GB) revealed that the process remains large after the file has been downloaded. Since the modern malloc often returns freed space to the system, this indicates that it's a plain memory leak. That, in turn, ought to make the bug pretty easy to find.

-- geoff

#6 Updated by darix almost 10 years ago

what is your usage pattern for lighttpd?
i am only aware of one memory leak in lighttpd in combination with mod_proxy. are you using mod_proxy? and can you attach your config? (you can obfuscate stuff if needed)

#7 Updated by Anonymous almost 10 years ago

Replying to darix:

what is your usage pattern for lighttpd?
i am only aware of one memory leak in lighttpd in combination with mod_proxy. are you using mod_proxy? and can you attach your config? (you can obfuscate stuff if needed)

No need to obfuscate; I just attached the config file. There's no mod_proxy.

The usage pattern is VERY light (only a few users per day), but essentially all the activity is downloads of huge files over a slow link. I did a bit of code browsing, and my guess is that chunk.c doesn't limit the length of the chunk queue. So the slow link backs up, the chunk queue grows to the size of the file, and lighty runs out of memory.

If that guess (and it's only a guess) is correct, there's no memory leak, just a failure to limit the queue length. I haven't dug deeply into the code yet to see whether that's true, nor to see how hard it will be to add a queue limit.

-- geoff

#8 Updated by Anonymous almost 10 years ago

OK, I did a test and it's not a memory leak. I downloaded a 553M-ish file, and lighty went up to 552M in size, then shrank back to a thrifty 26M after the download was done. (Note that it didn't quite get to the size of the file; I think that's because some of the chunks went out over the net while the file was being read in.)

I think this should be easy to fix. I just need to understand how lighty's asynchrony works. Then I could make it stop reading the file when the queue got too big, and come back later.

Oh, one other thing. I keep talking about a file, but the bug is actually related to CGIs. As far as I can tell (I didn't write the Ruby code), our CGI stuffs the file directly to lighty, rather than using X-LIGHTTPD-send-file to get the data to go out. Obviously, that suggests an alternate fix on the Ruby side. But it's still a bug that lighty swallows whatever a broken CGI sends it, without limiting its memory usage.

(As a somewhat related comment, the crash is due to an assertion failure after a malloc. A web server should never crash due to a malloc failure; at an absolute minimum it should generate a log message, and really it should degrade gracefully. A relatively easy quick fix would be to replace assert with a macro that generated a log message before dying.)

-- geoff

#9 Updated by darix almost 10 years ago

configure lighttpd to use sendfile and the memory usage will be lower.

#10 Updated by Anonymous almost 10 years ago

Unfortunately, configuring sendfile doesn't help because the Rails version I'm using doesn't support it (nor should it, since sendfile is server-specific). In any case, that only works around the bug. It shouldn't be possible for a misbehaving CGI script to crash the server simply by supplying a large amount of output.

Fortunately, I was able to come up with a patch that mitigates the problem. I will attach it after I complete this comment. My change limits the size of the write queue, and stops reading input from the FastCGI script when it becomes excessively large. The downside of my patch is that the entire server process blocks (this is undoubtedly because I don't properly understand lighty's asynchrony mechanisms). However, if you set max_procs to an appropriate value in the fastcgi.server section of your config file, the blocked process won't be problematic because other processes will handle other users. I used max_procs = 10, since my server has few users despite serving very large files.

WARNING: Install this patch with caution. It will not crash your server, but it may make it inaccessible if lots of users are downloading large files at the same time. I doubt that this is the "correct" fix. However, I hope that this patch is useful to some people who are having this problem, and I hope it will help someone more knowledgeable to develop a better patch.

-- geoff

#11 Updated by stbuehler over 8 years ago

  • Status changed from New to Fixed
  • Resolution set to wontfix

OTOH we don't want to block the backend, as they most often can only handle one reqeuest at a time (or need a thread for every request).

So the patch will not get upstream; i doubt we will change this in 1.4. perhaps someone will fix this for mod_proxy_core in 1.5.

#12 Updated by stbuehler over 8 years ago

  • Status changed from Fixed to Wontfix

#13 Updated by gstrauss 10 months ago

  • Related to Bug #949: fastcgi, cgi, flush, php5 problem. added

#14 Updated by gstrauss 10 months ago

  • Description updated (diff)
  • Status changed from Wontfix to Patch Pending
  • Target version set to 1.4.40

New: asynchronous, bidirectional streaming support for request and response
Submitted pull request: https://github.com/lighttpd/lighttpd1.4/pull/66

included in the pull request is buffering large responses to temporary files instead of keeping it all in memory

#15 Updated by gstrauss 10 months ago

Unfortunately, configuring sendfile doesn't help because the Rails version I'm using doesn't support it

BTW, having lighttpd used sendfile() is separate from Rails backends. Also, Rails app on a local machine can send the X-Sendfile header back to lighttpd instead of transferring the file over socket/pipe, and lighttpd can read the file directly from disk. This is separate from sendfile(), even though the names are similar.

Current HEAD of master contains patches which extend X-Sendfile header to CGI and SCGI, in addition to FastCGI

#16 Updated by gstrauss 9 months ago

  • Status changed from Patch Pending to Fixed
  • % Done changed from 0 to 100

Also available in: Atom