Bug #1829
closedlighttpd leaves CLOSE_WAIT connections
Description
Hello,
I did some research about connections in CLOSE_WAIT state. Most of the sources in the internet tell me, that it's some fault in the application.
The appliction is lighttpd 1.4.20 (1.4.19 did the same thing), with the lighttpd-angel process for easy reloading.
lsof:
lighttpd 54571 lighttpd 4u IPv4 4809493 TCP *:http (LISTEN)
lighttpd 54571 lighttpd 7u IPv4 4812372 TCP xx.yy.zz.84:http->xx.yy.zz.15:16800 (CLOSE_WAIT)
lighttpd 54571 lighttpd 9u IPv4 4815063 TCP xx.yy.zz.20:http->xx.yy.zz.15:nm-game-server (ESTABLISHED)
lighttpd 54571 lighttpd 10u IPv4 4810032 TCP xx.yy.zz.86:http->xx.yy.zz.15:10246 (CLOSE_WAIT)
lighttpd 54571 lighttpd 11u IPv4 4815303 TCP xx.yy.zz.21:http->xx.yy.zz.22:51661 (ESTABLISHED)
lighttpd 54571 lighttpd 13u IPv4 4811472 TCP xx.yy.zz.86:http->xx.yy.zz.15:24220 (CLOSE_WAIT)
lighttpd 54571 lighttpd 14u IPv4 4810666 TCP xx.yy.zz.86:http->xx.yy.zz.15:10272 (CLOSE_WAIT)
lighttpd 54571 lighttpd 16u IPv4 4809742 TCP xx.yy.zz.86:http->xx.yy.zz.15:10230 (CLOSE_WAIT)
lighttpd 54571 lighttpd 17u IPv4 4812380 TCP xx.yy.zz.84:http->xx.yy.zz.15:16804 (CLOSE_WAIT)
lighttpd 54571 lighttpd 18u IPv4 4810306 TCP xx.yy.zz.86:http->xx.yy.zz.15:axis-wimp-port (CLOSE_WAIT)
lighttpd 54571 lighttpd 21u IPv4 4809750 TCP xx.yy.zz.86:http->xx.yy.zz.15:10235 (CLOSE_WAIT)
lighttpd 54571 lighttpd 24u IPv4 4811848 TCP xx.yy.zz.86:http->xx.yy.zz.15:24232 (CLOSE_WAIT)
lighttpd 54571 lighttpd 26u IPv4 4811184 TCP xx.yy.zz.86:http->xx.yy.zz.15:24204 (CLOSE_WAIT)
lighttpd 54571 lighttpd 30u IPv4 4813798 TCP xx.yy.zz.84:http->xx.yy.zz.15:28490 (CLOSE_WAIT)
lighttpd 54571 lighttpd 31u IPv4 4813328 TCP xx.yy.zz.84:http->xx.yy.zz.15:28461 (CLOSE_WAIT)
lighttpd 54571 lighttpd 33u IPv4 4812662 TCP xx.yy.zz.84:http->xx.yy.zz.15:28437 (CLOSE_WAIT)
lighttpd 54571 lighttpd 36u IPv4 4813533 TCP xx.yy.zz.84:http->xx.yy.zz.15:28481 (CLOSE_WAIT)
lighttpd 54571 lighttpd 41u IPv4 4814131 TCP xx.yy.zz.84:http->xx.yy.zz.15:29531 (CLOSE_WAIT)
Updated by darix almost 16 years ago
- Status changed from New to Invalid
this is not really an issue. CLOSE_WAIT is a normal state in the life cycle.
Updated by GeorgH almost 16 years ago
- Status changed from Invalid to Reopened
True. But these CLOSE_WAIT connections stay open for hours, days, maybe forever. Which should not be part of the life cycle.
Maybe it has to do with the fcgi backends, but I'm quiet sure, there is some bug.
Updated by GeorgH almost 16 years ago
I've tried to create a setup where I can reproduce the whole thing in a very simple setup. The point is, most of this is reproduceable, by setting up a machine with lighttpd and php via fastcgi. I've created a .php file containing only a sleep and then a phpinfo();
I'm accessing this page via wget, which times out right before the sleep finishes. Under these circumstances I can reproduce the CLOSE_WAIT state.
But I'm still not 100% sure, if this is exactly the same thing I get in the production evironment.
Updated by Olaf-van-der-Spek almost 16 years ago
What is the problem?
You don't like CLOSE_WAIT?
Updated by GeorgH almost 16 years ago
The problem still is, that I've two nice webservers, one has 2090 and the other 1610 connections in CLOSE_WAIT state. The connections simply never close down :-(
I still hope to finde a solution to this phenomenon, which I think may come from the lighttpd-angle process.
Updated by stbuehler almost 16 years ago
- check the status page (mod_status) if there are connection in the "write" state which have written everything and match the CLOSE_WAIT connections; find out which modules handled these requests (fastcgi? upload_progress?).
- are you using 3rd party patches like mod_deflate?
Updated by GeorgH almost 16 years ago
The state seen with mod_status is "handle-req" on all the connections that stay open.
Nearly all requests are handled by fastcgi in my setup, as these machines only handle the dynamic generated traffic.
I'm not using mod_deflate on these servers. But as I see now, I've still activated mod_compress from some tests. But the cache dir is empty.
Perhaps the list of modules from mod_status config helps?
indexfile
access
alias
rewrite
fastcgi
status
compress
fastcgi
accesslog
dirlisting
staticfile
Updated by stbuehler almost 16 years ago
Strange - fastcgi is listed twice. if the requests hang in handle-req, it may be a problem with overloaded fastcgi backends.
Updated by GeorgH almost 16 years ago
Thanks for the hint. fastcgi got really loaded twice, once through modules.conf and once through conf.d/fastcgi.conf. I fixed this.
Nevertheless, I still get fresh CLOSE_WAIT connections.
How exactly would I find out about a overloaded backend? As far as I see it, I still could handle more than 200 additional connections per backend:
- lsof -i :2000 | grep ^php5 | grep -v LISTEN -c
41 - lsof -i :2000 | grep ^php5 | grep LISTEN -c
257
I'll get back, when I've more information.
Updated by GeorgH over 15 years ago
As I could not find any solution to the problem, I've switched to version 1.5. Now the problem is gone for me.
Updated by gstrauss over 8 years ago
There probably should be a timeout for socket connections to backends when lighttpd is waiting to write/read to/from them. If there is a timeout on the backend, maybe lighttpd should set SO_LINGER enabled with timeout 0 before closing, so that TCP connection gets reset (TCP RST).
Updated by gstrauss over 8 years ago
@GeorgH: this might be fixed in a recent pull request:
https://github.com/lighttpd/lighttpd1.4/pull/53 improves the control flow logic in dynamic handlers, and so they will abort the connection to the backend after configured timeouts. (Note there is more work that needs to be done to add additional timeout configuration options.)
Any chance you can test this? (I realize that it is 7 1/2 years since you reported the issue, but figured I'd ask anyway :))
Updated by gstrauss over 8 years ago
- Related to Bug #1149: handle-req time too long added
Updated by GeorgH over 8 years ago
I'm sorry. I guess the system where this problem occurred doesn't exist anymore... for a few years now.
Updated by gstrauss over 8 years ago
- Category set to core
- Status changed from Reopened to Fixed
- Target version set to 1.4.x
I believe this issues has since been fixed in historical commits, potentially this one:
commit d00e1e79b94e0f4da35292d2293595dac02993c7 Author: Stefan Bühler <stbuehler@web.de> Date: Sat Feb 7 13:32:54 2015 +0000 [connections] fix bug in connection state handling if a request was finished (con->file_finished = 1) and the state machine was triggered, but the write queue was empty, it didn't actually finish the request. From: Stefan Bühler <stbuehler@web.de> git-svn-id: svn://svn.lighttpd.net/lighttpd/branches/lighttpd-1.4.x@2973 152afb58-edef-0310-8abb-c4023f1b3aa9
Other commits related to closing connections: svn r2636 (476c5d48) and svn r2645 (20c4cd55)
@GeorgH: Thanks for your response. I was unable to reproduce your issue against tip of master given your description above using wget with timeout and FastCGI which waited a little bit longer. (Sorry, building 1.4.20 to attempt to repro would require quite a bit more work since I do not have a supporting toolchain available.) I tried running tip of master lighttpd 1.4.x with linux-sysepoll and poll backends, and tried modifying the size of the returned document. I could see with strace that larger than a page-size of 4k resulted in EPIPE when attempting to writev() to the wget client which had already closed the connection. For smaller documents, the writev() succeeded, and then the shutdown() failed with ENOTCONN, and then the connection was closed. It thankfully looks as if this bug was fixed in the past.
If anyone is still having this issue, please reopen ticket and test https://github.com/lighttpd/lighttpd1.4/pull/53
Thanks.
Also available in: Atom