Bug #1264
closedmod_cgi buffers data without bound (OOM on embedded system)
Description
I'm using 1.4.r1734 and I have a CGI that provides a continuous stream of data (motion jpeg images). If the client reads at speed everything is happy, if the client reads slower then the CGI is pushing the data, then lighttpd's buffers will grow until the (embedded) machine OOMs.
The clients will connect and stay connected for days or weeks at a time. So if the CGI is writing faster then the client is reading it will OOM the machine, it is just a matter of time. At the datarates in question this can happen <30s.
Lighttpd needs to remove the CGI pipe from the fd set when the buffer goes full and add it back when the buffer drains. Growing the buffer or saving the data to disk isn't really an option for streaming data delivery.
Attached is a simple cgi shell script that will demonstrate the problem (it needs a few .jpg images in the same directory to run).
To watch lighttpd grow without bound:
curl --limit-rate 100k http://localhost/pump.cgi > /dev/null
You can see this in a firefox with a .html page like:
<html><body><img src="/pump.cgi" alt="Streaming image"></body></html>
Files
Updated by ctaylor over 17 years ago
I attached my first attempt at a workaround for this issue: lighttpd.r1882.cgi-throttle.patch
I left in some debugging TRACE messages for testing, they should be removed or commented out if this is applied.
If high_watermark is non-zero and the number of bending bytes in send_raw is greater then high_watermark, then stop waiting on events for the CGI pipe file descriptor until the number of bytes in send_raw is less than low_watermark.
I've been testing with high_watermark set to 128K and low_watermark set to 16K. This buffering is on top of the TCP socket buffers, so the high_watermark doesn't really need to be that high.
So far this seems to help, but I'm still seeing the memory usage of lighttpd serving a single long running connection slowly growing. This is not a memory leak, just an expansion of the working set. For a single connection, lighttpd really should hit a steady state fairly quickly and so far it seems to be slowly growing. I need to be able to pump data for weeks or months at time, a slowly increasing working set will OOM the machine given enough time.
Updated by ctaylor over 17 years ago
I ran an overnight test with 3 cgi connections, 1 connection at 100kbps, 1 connection at 400kbps and 1 connection at ~3-6mbps. After 13 hours, the lighttpd process had grown to 432M (resident) and the CPU use had grown from <1% to 90%.
The attached patch seems to constrain the size of the send_raw buffer,
This is not a memory leak (as confirmed by valgrind), all the allocated memory is being freed, the problem is that way too much memory is being allocated.
I recompiled with profiling and static modules and saw that for 390.16MB transfered over ~10 minutes of server uptime, >99% of the runtime is spent in chunkqueue_steal_all_chunks(), but time isn't spent in its children chunkqueue_{steal_chunk,append_buffer,steal_tempfile,appendfile}, so the cost has to be in the list chunk list traversal.
It appears that finished chunks are accumulating in the chunkqueues, but it seems that they aren't accumulating at the same rate as the data is being pushed over the network.
With chunked.encoding disabled, it appears that finished chunks are accumulating in the con->send chunk queue. With chunked.encoding enabled, it appears that finished chunks are accumulating in the con->send_filters->last->cq chunkqueue.
With chunked.encoding disabled, if I add a call to chunkqueue_remove_finished_chunks(con->send)
after the call to filter_chain_copy_output() in connection.c, lighttpd seems to stop growing.
With chunked.encoding enabled, I then I have to remove finished chunks from con->send_filters->last->cq.
Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 99.37 202.88 202.88 164843 0.00 0.00 chunkqueue_steal_all_chunks 0.07 203.03 0.15 164869 0.00 0.00 connection_state_machine 0.06 203.16 0.13 1 0.13 204.11 lighty_mainloop 0.03 203.23 0.07 164843 0.00 0.00 filter_chain_copy_output 0.03 203.30 0.07 164861 0.00 0.00 mod_cgi_read_response_content 0.03 203.37 0.07 164886 0.00 0.00 joblist_append 0.03 203.43 0.06 163480 0.00 0.00 fdevent_get_revents 0.02 203.48 0.05 434149 0.00 0.00 chunkqueue_get_unused_chunk 0.02 203.53 0.05 347648 0.00 0.00 chunkqueue_remove_finished_chunks 0.02 203.58 0.05 255148 0.00 0.00 buffer_prepare_copy 0.02 203.63 0.05 164843 0.00 0.00 plugins_call_handle_filter_response_content 0.02 203.68 0.05 164843 0.00 0.00 plugins_call_handle_read_response_content 0.02 203.72 0.04 163608 0.00 0.00 fdevent_poll 0.02 203.76 0.04 163608 0.00 0.00 fdevent_poll_poll 0.02 203.79 0.04 87763 0.00 0.00 fdevent_poll_event_add 0.01 203.82 0.03 259831 0.00 0.00 chunk_free 0.01 203.85 0.03 87607 0.00 0.00 network_write_chunkqueue 0.01 203.87 0.02 250550 0.00 0.00 network_read_chunkqueue_read 0.01 203.89 0.02 164868 0.00 0.00 plugins_call_handle_joblist 0.01 203.91 0.02 164860 0.00 0.00 cgi_demux_response 0.01 203.93 0.02 164843 0.00 0.00 mod_chunked_encode_response_content 0.01 203.95 0.02 87607 0.00 0.00 network_write_chunkqueue_linuxsendfile 0.01 203.97 0.02 87607 0.00 0.00 network_write_chunkqueue_writev_mem 0.01 203.99 0.02 86959 0.00 0.00 cgi_copy_response 0.01 204.00 0.02 12 0.00 0.00 network_read 0.01 204.02 0.02 165009 0.00 0.00 fdevent_event_del 0.00 204.03 0.01 523478 0.00 0.00 buffer_free 0.00 204.04 0.01 434186 0.00 0.00 chunk_reset 0.00 204.05 0.01 434145 0.00 0.00 chunkqueue_get_append_buffer 0.00 204.06 0.01 259831 0.00 0.00 chunk_init 0.00 204.07 0.01 163480 0.00 0.00 fdevent_poll_get_revents 0.00 204.08 0.01 163480 0.00 0.00 fdevent_revents_reset 0.00 204.09 0.01 137 0.00 0.00 cgi_connection_close_callback 0.00 204.10 0.01 4 0.00 0.00 chunkqueue_get_prepend_buf ...
Updated by gstrauss over 8 years ago
- Description updated (diff)
- Assignee deleted (
jan) - Missing in 1.5.x set to Yes
lighttpd 1.4.40 buffers large request and/or response to temp files on disk.
Updated by hsiaoairplane over 7 years ago
gstrauss wrote:
lighttpd 1.4.40 buffers large request and/or response to temp files on disk.
But on embedded system, buffers large request or response to temp files is still a risk.
Updated by gstrauss over 7 years ago
In lighttpd 1.4.x:
See Server_stream-response-bodyDetails
See also cgi.x-sendfile
in Docs_ModCGI
As noted in #2803, hsiaoairplane has not read the doc or the release notes.
hsiaoairplane even posted to this obsolete ticket for lighttpd 1.5.0.
@hsiaoairplane, please stop posting to the issue tracker. Your questions are better suited to the support forum https://redmine.lighttpd.net/projects/lighttpd/boards/2 after your have spent a little more team looking over lighttpd release notes and documentation. It would probably save you a bit of time, as well as inform you about the features of lighttpd especially those targetting embedded systems with low memory.
Also available in: Atom