lighttpd 1.4.42, 1.4.43 hangs during continuous polling
This sounds vaguely like issue #2771, but I get no error messages, and lighttpd continues to run, just refuses further connections or any activity. (This has not been observed in 1.4.41 or prior).
While polling for new data continuously through cgi channel, lighttpd eventually (10~30 minutes) will simply stop exchanging data or accepting new connections. This is an instance of a single browser connection using PHP and cgi to load a dynamic page then continuously refresh data through cgi.
Configuration can be made available, but essentially proxys 'api/v1/...' to a local nodejs server on localhost:8080. This has worked previously with 1.4.34, but the improved proxy features (min-buffered passthru, i.e., server.stream-request-body=2) of 1.4.40+ are now required in our application, thus the interest in upgrade.
Wireshark shows similar http activity simply stops unexpectedly following an access.
Modules in use are mod_cgi, mod_fastcgi, mod_proxy, mod_access, mod_auth
Updated by gstrauss over 4 years ago
Hmm. Thanks for the report. I will look more closely at strace later.
Are the PHP requests going through CGI or FastCGI, or are you proxying to another backend? Would you share your lighttpd.conf?
There are some fixes post 1.4.43, so you might want to test with tip of lighttpd git master to see if the problem has already been fixed.
Updated by gstrauss over 4 years ago
- Status changed from New to Need Feedback
Looking at the strace, I see that after 16:38:18, lighttpd event loop is still waking up every second, checking the time, and then going back into the kernel for another second of epoll_wait() while lighttpd waits for new connections. That behavior looks proper and expected to me. lighttpd does not appear to be hung.
Are you sure that lighttpd has "hung"? Did you try to 'telnet localhost 80' on the same box to see if lighttpd is still responding? Is your networking still up? Did your firewall get reconfigured to block connections from the PHP (or other networks external to the machine)? Did SELinux or some other security mechanism start blocking or dropping external connections? (I know this is a stretch but lighttpd does not appear to be hung.) Look at your system to see what else is running that might potentially interfere.
Likely unrelated: looking at https://redmine.lighttpd.net/boards/2/topics/6956, you might want to take a look at fastcgi.conf for the location of the parenthesis around "php-num-procs" in your fastcgi.conf. I see the comment there, but the config still looks incorrect to me. The fastcgi.server value is a key => value list (extensions or paths) with value as a key => value list of host-label => (key => value list of host config)
fastcgi.server = ( "ext1" => ( "label1" => ( "host" => "127.0.0.1", ... ) "label2" => ( "host" => "127.0.0.1", ... ), ), "ext2" => ( "label3" => ( "host" => "127.0.0.1", ... ), "label4" => ( "host" => "127.0.0.1", ... ), ), )
The "label" is optional, so you could have something that looks like:
fastcgi.server = ( "ext1" => ( ( "host" => "127.0.0.1", ... ) ( "host" => "127.0.0.2", ... ), ), "ext2" => ( ( "host" => "127.0.0.3", ... ), ( "host" => "127.0.0.4", ... ), ), )
Updated by smikesmith over 4 years ago
Thanks greatly for your feedback. We use fastcgi for PHP, I think. That's from someone else, so I'll have him look at your comments about config.
Yes, lighttpd was still running as shown by the strace, but by "locked", I meant that it simply stopped serving existing connections or taking new ones. In short, it stopped being a server, and the only recourse appeared to be to restart it.
As for 2775, so far the patch 9925202 seems to have cured the problem, and we're moving forward on testing. We need the feature to stream in minimized memory as we are pushing backups much larger than actual RAM size, so that will be a great feature. I'll post again if we find other problems, but meanwhile, looks like we're set!
Also available in: Atom