Project

General

Profile

[Solved] After running for a period of time (about 40 minutes), lighttpd no longer responds to web page requests. (not lighttpd issue)

Added by ouyang.wei.victor over 1 year ago

I tested lighttpd 1.4.69 and 1.4.73 and found that lighttpd would no longer respond to web page requests after running for about forty minutes. My operating system is embedded linux, the browser I use is chrome, and I use http2 and openssl.
Thanks.


Replies (37)

RE: After running for a period of time (about 40 minutes), lighttpd no longer responds to web page requests. - Added by gstrauss over 1 year ago

Nov 6 14:34:40 sic daemon.err lighttpd[8856]: (mod_openssl.c.3367) SSL: 1 error:0A000126:SSL routines::unexpected eof while reading (10.169.34.139)

That is an error from the openssl TLS library, but is not necessarily a hard error since clients might not properly handle TLS alert CLOSE_NOTIFY.

What version of openssl are you running?

I tested lighttpd 1.4.69 and 1.4.73 and found that lighttpd would no longer respond to web page requests after running for about forty minutes. My operating system is embedded linux, the browser I use is chrome, and I use http2 and openssl.

Did a prior version of lighttpd work on your system with http2 and openssl?

My operating system is embedded linux

Since you're on an embedded system, if you're serving static files instead of CGI, have you tried setting server.network-backend = "writev" instead of the default ("sendfile")?

Since you're on an embedded system, have you tried setting server.max-connections = 16?

Have you tried testing using a TLS library aimed at embedded systems? lighttpd supports mbedtls with lighttpd mod_mbedtls, and lighttpd supports wolfssl with lighttpd mod_wolfssl. Since your config specifies MinProtocol TLSv1.3, try lighttpd mod_wolfssl, as the underlying wolfssl supports TLSv1.3, whereas mbedtls 2.x does not support TLSv1.3 (though later versions of mbedtls 3.x might).

RE: After running for a period of time (about 40 minutes), lighttpd no longer responds to web page requests. - Added by ouyang.wei.victor over 1 year ago

Thanks for your answer. I am using openssl3.1.0, and I noticed that it can be used normally for a period of time after the program is started. But an exception will occur after a while. At this time, lighttpd seems to be stuck in an infinite loop and cannot handle new requests. This exception appears to be caused by a socket file handle exception.

RE: After running for a period of time (about 40 minutes), lighttpd no longer responds to web page requests. - Added by gstrauss over 1 year ago

But an exception will occur after a while.

vague

At this time, lighttpd seems to be stuck in an infinite loop and cannot handle new requests.

vague. Are you able to get a stack trace? Attach with gdb and issue command bt full. Or at least man pstack.

This exception appears to be caused by a socket file handle exception.

vague. Are you able to capture strace of the lighttpd process to demonstrate what you are calling "socket file handle exception"?

Please try to share more detailed information for troubleshooting.

RE: After running for a period of time (about 40 minutes), lighttpd no longer responds to web page requests. - Added by gstrauss over 1 year ago

What is the pattern of requests that might be in progress on the HTTP/2 connection?
The stack trace tells me that you have enabled debug logging and that specific stack is in the middle of evaluating mod_redirect config for the request. That code will not get into an infinite loop, though that might be part of a larger loop if lighttpd is spinning on the CPU. Please capture a few more stack traces, and try to catch strace when the exception occurs. Simply saying "an exception occurs" is useless to me. Please try to provide more detailed information.

RE: After running for a period of time (about 40 minutes), lighttpd no longer responds to web page requests. - Added by gstrauss over 1 year ago

In the top-level config you shared, the only use of mod_redirect is in

$SERVER["socket"] == ":80" {
        $HTTP["host"] =~ "(.*)" {
                url.redirect = ( "^/(.*)" => "https://%1/$1" )
        }
}

so the condition evaluation result from the stack trace should be false on a clear-text connection on port 80.

If you temporarily disable this block of code in your lighttpd.conf and restart lighttpd, what does the stack trace look like after an exception? gdb bt full might be more useful to me.

BTW, in modern lighttpd, you could use simpler config to redirect from HTTP to HTTPS. See HowToRedirectHttpToHttps

RE: After running for a period of time (about 40 minutes), lighttpd no longer responds to web page requests. - Added by ouyang.wei.victor over 1 year ago

I modified the configuration file and deleted the configuration related to port 80, but after running for a period of time (about an hour), the program crashed. I can't print any useful stack information using gdb, it looks like the stack has been damaged.

From the logs, it can be seen that I will send web requests at a fixed time interval. This is the automatic refresh program I set up in the HTML program, and there are no other operations besides this.
Before the program crashes, the program runs without any exceptions, and the CPU and memory usage remain low.
Could you give me further suggestions

RE: After running for a period of time (about 40 minutes), lighttpd no longer responds to web page requests. - Added by gstrauss over 1 year ago

Please see How to get support
You have not shared your complete lighttpd config.

How is lighttpd installed on your system? Is it a package or a custom build? Have you customized anything in the code? Any custom modules?
-m weblib/lib suggests to me that this is a custom build.
If you're using -m, then full path is recommended, e.g. -m $PWD/weblib/lib
Also, if you're doing a custom build, then make sure that you build and run the software consistently. If you build against a custom build of openssl, then make sure that lighttpd uses those specific libraries (or ABI-compatible versions) at runtime. You can generally build against an older version of a library, v1.2.3 and then run with a latter compatible version of the library when only the patch number has changed, e.g. v1.2.6. However, building against v1.2.6 and running against older libraries v1.2.3 might not be compatible.

Do you have any cron jobs that signal lighttpd at any regular interval?
Have you tried running strace -o /path/to/strace.log on lighttpd to see what system calls are made prior to the crash?

Could you give me further suggestions

If this is reproducible for you, you could run lighttpd under gdb until it crashes and then get a stack trace. Alternatively, you can configure your system to save cores when a program crashes.

As a coarse approach, you could try disabling HTTP/2 support in lighttpd to see if the crash occurs when using HTTP/2 or not.
See server.feature-flags
server.feature-flags += ("h2proto" => "disable")

Aside: unless your /resource/normal.gif changes frequently, you might consider using caching headers with mod_expire.

RE: After running for a period of time (about 40 minutes), lighttpd no longer responds to web page requests. - Added by ouyang.wei.victor about 1 year ago

I've uploaded all my configuration files and latest logs. lighttpd is custom-built. I only added logs in a few locations for debugging, and did not customize other content or modules. I'm sure the openssl build and runtime libraries are the same.
My web page will periodically request CGI from lighttpd every 5 seconds to refresh the data. When more than three web pages are opened at the same time, the program will run normally at first, but after a period of time, the CPU usage will suddenly reach 100%, and lighttpd will become unresponsive, and then collapse.
You can see from the logs I uploaded that at the time of 10:33:06 (line 402), the program seems to be blocked and does not continue to poll the cgi status as above. These logs are all customized by me. The details can be Look at the picture below.

I have added http2 disabling and mod_expire, but it seems to have no effect.

RE: After running for a period of time (about 40 minutes), lighttpd no longer responds to web page requests. - Added by ouyang.wei.victor about 1 year ago

Sorry, maybe because the CPU is full, I can't capture a valid crash stack. The ones that can be captured so far are like the one above, which prompts corrupt stack. Since the storage capacity of the embedded board is very small, strace has not been transplanted.

RE: After running for a period of time (about 40 minutes), lighttpd no longer responds to web page requests. - Added by gstrauss about 1 year ago

The 'messages' file you attached is using HTTP/2, but the configs you attached disable HTTP/2. These do not match.

server.max-connections = 8

That is appropriate for an embedded system. However, if there are 8 connections to lighttpd, and all are waiting for responses from a backend CGI, then lighttpd will not accept new connections. lighttpd will continue to accept requests on existing connections.

If you are using HTTP/1.x, you can set server.max-keep-alive-requests = 0 to disable keep-alive requests.
server.max-keep-alive-requests
server.max-keep-alive-idle

Is your CGI responding?
mod_cgi
If you expect your CGI to respond in < 2 seconds
cgi.limits = ("write-timeout" => 2, "read-timeout" => 2, "tcp-fin-propagate" => "SIGTERM")

Is /tmp filling up?
Is your system running out of memory? If so, is the OOM killer killing the lighttpd process?
Is lighttpd spinning at 100% on the CPU, but no other resources exhausted?

How efficient is your CGI program? If you can reproduce the issue with three simultaneous requests, try replacing your CGI script with something simpler and see if you can reproduce the issue

#!/bin/sh
printf "Status: 204\n\n" 

RE: After running for a period of time (about 40 minutes), lighttpd no longer responds to web page requests. - Added by gstrauss about 1 year ago

but after a period of time, the CPU usage will suddenly reach 100%, and lighttpd will become unresponsive, and then collapse.
You can see from the logs I uploaded that at the time of 10:33:06 (line 402), the program seems to be blocked and does not continue to poll the cgi status as above.

I do not think that you're reading that correctly. lighttpd continues to function, e.g. the next request 5 seconds later is logged. lighttpd is waiting for the backend to respond and not receiving any events. Your logs demonstrate that lighttpd is not blocked.

Nov 14 10:33:06 sic daemon.err lighttpd[6353]: (response.c.856) cgi handle_subrequest ret 2
Nov 14 10:33:06 sic daemon.err lighttpd[6353]: (response.c.868) return HANDLER_WAIT_FOR_EVENT 
Nov 14 10:33:11 sic daemon.err lighttpd[6353]: (h2.c.1572) fd:9 id:353 rqst: :method: GET 
Nov 14 10:33:11 sic daemon.err lighttpd[6353]: (h2.c.1572) fd:9 id:353 rqst: :authority: 10.169.34.160 
Nov 14 10:33:12 sic daemon.err lighttpd[6353]: (h2.c.1572) fd:9 id:353 rqst: :scheme: https 
Nov 14 10:33:12 sic daemon.err lighttpd[6353]: (h2.c.1572) fd:9 id:353 rqst: :path: /cgi_main_monitor.cgi?pageType=2&devIndex=7&commValue1=0&commValue2=0&commValue3=0 

Thus far, it appears to me that the problem is with your CGI program and whatever it is doing. You might want to add some concurrency control, e.g. a lock file on disk, so that it only handles one request at a time. I have a hunch that something your CGI program is executing is a program or device on the system that is not reentrant and does not support parallel actions.

If you could strace your CGI program, you might see that. Or if you can attach a debugger you might see that.

RE: After running for a period of time (about 40 minutes), lighttpd no longer responds to web page requests. - Added by ouyang.wei.victor about 1 year ago

I tried to write a very simple cgi program and a simple html, but found that problems would occur. I noticed that the memory resources were always normal.

cgi.limits = ("write-timeout" => 2, "read-timeout" => 2, "tcp-fin-propagate" => "SIGTERM")

I also modified this, but the problem still exists.

https://redmine.lighttpd.net/boards/2/topics/8091?r=8102#message-8102
I noticed that this problem was exactly the same as mine, so I tried to modify it according to this, but there was still a problem.

server.stream-request-body = 1
ssl.read-ahead = "disable"
server.stream-response-body = 1
server.chunkqueue-chunk-sz = 32768

And I noticed that the network connection seemed to have not been processed after the problem occurred.

RE: After running for a period of time (about 40 minutes), lighttpd no longer responds to web page requests. - Added by gstrauss about 1 year ago

https://redmine.lighttpd.net/boards/2/topics/8091?r=8102#message-8102
I noticed that this problem was exactly the same as mine, so I tried to modify it according to this, but there was still a problem.

That is because the issue you are seeing is unrelated to a bug that was fixed over 5 years ago in lighttpd 1.4.50.


Have you been able to configure your system to save core files? Then, when the issue occurs, pstack or send lighttpd process a kill -SIGQUIT to force lighttpd to dump core (and exit)

Have you tried the same lighttpd configuration running on a virtual machine in a test environment. Have you reproduced the issue outside of your development board?

How can I try to reproduce this? Please share your build configuration on how you built lighttpd and what compiler (and compiler version!) you used.

RE: After running for a period of time (about 40 minutes), lighttpd no longer responds to web page requests. - Added by gstrauss about 1 year ago

Did a prior version of lighttpd work on your system?

If you disabled HTTP/2 and still had an issue, have you tried using http (port 80) instead of https (port 443) to see if you can still reproduce the issue?

RE: After running for a period of time (about 40 minutes), lighttpd no longer responds to web page requests. - Added by gstrauss about 1 year ago

Posting pictures of text is not very technical. Do you know how to use <pre> tags in your posts? You can paste text in between <pre> and </pre> rather than posting poorly cropped screenshots of text.


On my x86_64 laptop, lighttpd serves about 1900 requests per second using mod_openssl and mod_cgi.
cat lighttpd.conf

server.document-root = "/dev/shm/" 
server.port := 8443
server.modules += ("mod_openssl")
ssl.engine = "enable" 
ssl.pemfile = "/path/to/cert.pem" 
ssl.privkey = "/path/to/key.pem" 
server.modules += ("mod_cgi")
cgi.assign = ("" => "")

cat foo.cgi
#!/bin/sh
printf "Status: 204\n\n" 

My test command runs h2load with 100 clients and 4 threads:
h2load -n 10000 -c 100 -t 4 -m 8 https://localhost:8443/foo.cgi

While I believe you that you are seeing different behavior on your specific embedded system, I am going ask that you collect more information on your system, and to help me to reproduce this. You need to be able to get strace or stack dump or debugger attached so that we have a better idea where lighttpd is spending time on the CPU.

My operating system is embedded linux

What is the kernel version?

RE: After running for a period of time (about 40 minutes), lighttpd no longer responds to web page requests. - Added by ouyang.wei.victor about 1 year ago

What is the kernel version?

My linux kernel version is 3.12.0

Have you been able to configure your system to save core files? Then, when the issue occurs, pstack or send lighttpd process a kill -SIGQUIT to force lighttpd to dump core (and exit)

Due to current hardware limitations, I am unable to grab the core dump file. But I used gdb attach to capture the information when the phenomenon occurred.
At this time I used http1.0 and removed https

[root@sic /]$gdb attach  7372

GNU gdb (GDB) 7.5
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying" 
and "show warranty" for details.
This GDB was configured as "arm-linux-gnueabihf".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
attach: No such file or directory.
Attaching to process 7372
Reading symbols from /lighttpd...done.
Reading symbols from /home/lib/libpcre2-8.so.0...(no debugging symbols found)...done.
Loaded symbols for /home/lib/libpcre2-8.so.0
Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libcrypto.so.3...(no debugging symbols found)...done.
Loaded symbols for /lib/libcrypto.so.3
Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux-armhf.so.3...(no debugging symbols found)...done.
Loaded symbols for /lib/ld-linux-armhf.so.3
Reading symbols from /lib/libatomic.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libatomic.so.1
Reading symbols from /lib/libpthread.so.0...(no debugging symbols found)...done.

warning: File "/lib/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".

warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
Loaded symbols for /lib/libpthread.so.0
Reading symbols from home/web/lib/mod_auth.so...(no debugging symbols found)...done.
Loaded symbols for home/web/lib/mod_auth.so
Reading symbols from home/web/lib/mod_authn_file.so...(no debugging symbols found)...done.
Loaded symbols for home/web/lib/mod_authn_file.so
Reading symbols from /lib/libcrypt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from home/web/lib/mod_cgi.so...done.
Loaded symbols for home/web/lib/mod_cgi.so
Reading symbols from home/web/lib/mod_h2.so...done.
Loaded symbols for home/web/lib/mod_h2.so
Reading symbols from home/web/lib/mod_openssl.so...(no debugging symbols found)...done.
Loaded symbols for home/web/lib/mod_openssl.so
Reading symbols from /lib/libssl.so.3...(no debugging symbols found)...done.
Loaded symbols for /lib/libssl.so.3

warning: File "/lib/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".

warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
0xb6b3b5b4 in ?? () from /lib/libc.so.6
(gdb)
(gdb) bt full
#0  0xb6b3b5b4 in ?? () from /lib/libc.so.6
No symbol table info available.
#1  0xb6b9707c in fork () from /lib/libc.so.6
No symbol table info available.
#2  0xb6bac710 in ?? () from /lib/libc.so.6
No symbol table info available.
#3  0xb6bac658 in posix_spawn () from /lib/libc.so.6
No symbol table info available.
#4  0x00033428 in fdevent_fork_execve (name=0x160cc8 "/home/web/pages-cgi/cgi_main_monitor.cgi", argv=0xbef82970, envp=0x144a78, fdin=-1, fdout=19, fderr=-1,
    dfd=23) at fdevent.c:603
        sigs = {__val = {3674624, 0 <repeats 31 times>}}
        file_actions = {__allocated = 8, __used = 1, __actions = 0x174788, __pad = {0 <repeats 16 times>}}
        attr = {__flags = 12, __pgrp = 0, __sd = {__val = {3674624, 0 <repeats 31 times>}}, __ss = {__val = {0 <repeats 32 times>}}, __sp = {__sched_priority = 0},
          __policy = 0, __pad = {0 <repeats 16 times>}}
        rc = 0
        pid = -1
#5  0xb6aa4b34 in cgi_create_env (r=0x15f6c0, p=0xb5d10, hctx=0x174700, cgi_handler=0x624e0) at mod_cgi.c:1039
        cgi_fds = {-1, -1, 17, 19}
        to_cgi_fds = 0xbef8297c
        from_cgi_fds = 0xbef82984
        bufsz_hint = 16384
        env = 0xb5d50
        args = {0x160cc8 "/home/web/pages-cgi/cgi_main_monitor.cgi", 0x0, 0xbef829b0 ""}
        envp = 0x144a78
        dfd = 21
        serrh_fd = -1
        pid = -1091032664
#6  0xb6aa520e in mod_cgi_handle_subrequest (r=0x15f6c0, p_d=0xb5d10) at mod_cgi.c:1253
        p = 0xb5d10
        hctx = 0x174700
        rd_revents = 0
        wr_revents = 0
        cq = 0x15f8c0
#7  0x00014cf6 in http_response_handler (r=0x15f6c0) at response.c:857
        p = 0xb5010
        rc = 0
#8  0x00015e90 in connection_state_machine_loop (r=0x15f6c0, con=0x15f6c0) at connections.c:662
        ostate = CON_STATE_REQUEST_START
#9  0x000162e8 in connection_state_machine (con=0x15f6c0) at connections.c:832
        rc = 1
        r = 0x15f6c0
#10 0x000187ac in network_server_handle_fdevent (context=0x5ccc8, revents=1) at network.c:95
        fd = 13
        con = 0x15f6c0
        srv_socket = 0x5ccc8
        srv = 0x5c0b8
        loops = 5
        nagle_disable = 1
        addr = {ipv6 = {sin6_family = 2, sin6_port = 37338, sin6_flowinfo = 2334304522, sin6_addr = {__in6_u = {
                __u6_addr8 = "\000\000\000\000\000\000\000\000\\\247\004\000\350\257\005", __u6_addr16 = {0, 0, 0, 0, 42844, 4, 45032, 5}, __u6_addr32 = {0, 0,
                  304988, 372712}}}, sin6_scope_id = 182001}, ipv4 = {sin_family = 2, sin_port = 37338, sin_addr = {s_addr = 2334304522},
            sin_zero = "\000\000\000\000\000\000\000"}, un = {sun_family = 2,
            sun_path = "\332\221\n\251\"\213\000\000\000\000\000\000\000\000\\\247\004\000\350\257\005\000\361\306\002\000\034\340\004\000P\b\020\000P\b\020\000\240\263\005\000\230\263\005\000\350\257\005\000\240\263\005\000\320*\370\276\320*\370\276\\\247\004\000a\001\000\000\274\245\004\000\230\263\005\000\240\263\005\000_\000\000\000\370*\370\276\061\310\002\000 +\370\276\001\000\000\000\017\000\000\000a\001"}, plain = {sa_family = 2,
            sa_data = "\332\221\n\251\"\213\000\000\000\000\000\000\000"}}
        addrlen = 16
#11 0x0001afa2 in fdevent_linux_sysepoll_poll (ev=0x5fde0, timeout_ms=1000) at fdevent_impl.c:358
        fdn = 0x5c068
        revents = 1
        i = 0
        epoll_events = 0x63240
        n = 1
#12 0x0001ae56 in fdevent_poll (ev=0x5fde0, timeout_ms=1000) at fdevent_impl.c:314
        n = 375964
#13 0x0001336c in server_main_loop (srv=0x5c0b8) at server.c:2230
        mono_ts = 937
        sentinel = 0x5bc9c <log_con_jqueue>
        joblist = 0x5bc9c <log_con_jqueue>
        last_active_ts = 918
#14 0x0001350c in main (argc=6, argv=0xbef82d04) at server.c:2320
        srv = 0x5c0b8
        rc = 1

How can I try to reproduce this? Please share your build configuration on how you built lighttpd and what compiler (and compiler version!) you used.

My compiler is arm-linux-gnueabihf-gcc, and the version is 4.9.1 20140505. My compilation method is

 ./configure   --build=i686-linux  --host=arm-linux-gnueabihf --prefix=`pwd`/install   CC=/arm-x86-sysroots/i686-linux/usr/bin/arm-linux-gnueabihf/bin/arm-linux-gnueabihf-gcc AR=/arm-x86-sysroots/i686-linux/usr/bin/arm-linux-gnueabihf/bin/arm-linux-gnueabihf-ar CFLAGS='-I/home/sic-or/3rdparty/openlib/pcre2/lib/../include -I/arm-x86-sysroots/acp-am335x-cpm3/usr/include -I/arm-x86-sysroots/acp-am335x-cpm3/usr/include/c++/4.9.1 -I/arm-x86-sysroots/acp-am335x-cpm3/usr/include/c++/4.9.1/arm-linux-gnueabihf --sysroot=/arm-x86-sysroots/acp-am335x-cpm3' LDFLAGS='-L/arm-x86-sysroots/acp-am335x-cpm3/usr/lib --sysroot=/arm-x86-sysroots/acp-am335x-cpm3'  --with-sysroot=/arm-x86-sysroots/acp-am335x-cpm3  --with-openssl --with-openssl-includes=`pwd`/../../../openlib/openssl/lib/../include  --with-openssl-libs=`pwd`/../../../openlib/openssl/lib/ PCRE2_LIBS=`pwd`/../../../openlib/pcre2/lib/libpcre2-8.so &make

Did a prior version of lighttpd work on your system?

I have currently tried 1.4.69 and 1.4.73 and found that they both have this problem.Or if I need to try a previous version, can you give me some suggestions?

RE: After running for a period of time (about 40 minutes), lighttpd no longer responds to web page requests. - Added by gstrauss about 1 year ago

My linux kernel version is 3.12.0

Wow. That is ancient.

Is lighttpd spinning on the CPU here?

#3  0xb6bac658 in posix_spawn () from /lib/libc.so.6
No symbol table info available.
#4  0x00033428 in fdevent_fork_execve

That would be strange. Did you true executing the debugger command continue, and then press Ctrl-C a second later and bt full again? You might do that a few times.

Given the age of your kernel, I would suggest that you try and build lighttpd with posix_spawn disabled. If you are using autotools, then after ./configure, edit config.h and comment out #define POSIX_SPAWN 1. Then compile. Confirm after compilation that nothing automatically re-ran autoreconf and modified config.h to undo your change.

You still have not answered my question if this has ever worked for you on an earlier version of lighttpd.

RE: After running for a period of time (about 40 minutes), lighttpd no longer responds to web page requests. - Added by ouyang.wei.victor about 1 year ago

You still have not answered my question if this has ever worked for you on an earlier version of lighttpd.

No,I have tried 1.4.69 and 1.4.73 and found that they both have this problem.Maybe I need to try a previous version?

RE: After running for a period of time (about 40 minutes), lighttpd no longer responds to web page requests. - Added by gstrauss about 1 year ago

What is the provenance of /lib/libc.so.6? glibc? musl? Something else? What version?

RE: After running for a period of time (about 40 minutes), lighttpd no longer responds to web page requests. - Added by gstrauss about 1 year ago

You still have not answered my question if this has ever worked for you on an earlier version of lighttpd.

No,I have tried 1.4.69 and 1.4.73 and found that they both have this problem.Maybe I need to try a previous version?

The latest stable release of lighttpd is almost always the best version to run for both security and reliability. If running on your ancient kernel and glibc 2.18 (from 2013) exposes bugs in those areas -- bugs that almost certainly have been fixed in the 10+ years since they were released -- then you might do as I described above and disable the feature detection in the lighttpd build configuration.

While lighttpd is spinning on the CPU, please grab a number of stack traces, continue, Ctrl-C, bt full, continue, Ctrl-C, bt full, continue, Ctrl-C, bt full, continue, Ctrl-C, bt full, continue, Ctrl-C, bt full, continue, Ctrl-C, bt full so that I might be able to get an idea what might be happening. Depending on how tight the loop, you might also attach debugger and step, step, step, etc (Press Enter to repeat the command) and step a few hundred or thousand instructions to see what functions lighttpd is in and what is happening.

RE: After running for a period of time (about 40 minutes), lighttpd no longer responds to web page requests. - Added by ouyang.wei.victor about 1 year ago

I changed the configuration back to the previous one (h2+https) and added some debug logs in mod_openssl

log_error(hctx->errh, __FILE__, __LINE__,
              "SSL: read start,mem:%p,len:%d",mem,mem_len);
        len = SSL_read(hctx->ssl, mem, mem_len);
        log_error(hctx->errh, __FILE__, __LINE__,
              "SSL: read end len:%d",len);

After several repeated tests, I found that every time a problem occurred, it would be blocked in SSL_read() for up to 18s. Then lighttpd no longer responded to any requests.

Nov 16 16:17:21 sic daemon.err lighttpd[13356]: (h1.c.284) network_read start
Nov 16 16:17:21 sic daemon.err lighttpd[13356]: (mod_openssl.c.3221) SSL: connection_read_cq_ssl
Nov 16 16:17:21 sic daemon.err lighttpd[13356]: (mod_openssl.c.3230) SSL:SSL_pending
Nov 16 16:17:21 sic daemon.err lighttpd[13356]: (mod_openssl.c.3236) SSL: read start,mem:0xffbf8,len:8192
Nov 16 16:17:39 sic daemon.err lighttpd[13356]: (mod_openssl.c.3239) SSL: read end len:24

And before that, lighttpd receives the Rst frame, and the connection status became CON_STATE_ERROR

Nov 16 16:17:15 sic daemon.err lighttpd[13356]: (h2.c.598) h2_recv_rst_stream set r:0xdfa40 CON_STATE_ERROR
Nov 16 16:17:20 sic daemon.err lighttpd[13356]: (h2.c.598) h2_recv_rst_stream set r:0xdfa40 CON_STATE_ERROR

messages (135 KB) messages

RE: After running for a period of time (about 40 minutes), lighttpd no longer responds to web page requests. - Added by ouyang.wei.victor about 1 year ago


I used tcpdump to capture network packets and found that there were network errors when the problem occurred, which may be related to this.

(1-25/37)