Project

General

Profile

[Solved] Lighttpd - Regression since .46

Added by adrien over 6 years ago

Hello guys,

First, thx for your work, big fan of lighttpd for years, always do the job :)

But : we have a problem with lighttpd version >= 46.

We have a lot of Iot devices who use soap web service (with gSoap implementation) and they work well behing the mod_cgi module.

But we updated the lighttpd version (to use WS) at version 49 but now sometimes (often) the web service call returns SoapFault exception: [Client] looks like we got no XML document.

First, we check the response data, and it is ok.
Second, we downgrade the lighttpd version (to the original version 35) -> back to normal -no bug
Third, iterate until version 45 -> working well, version 46 -> web service client start again to return errors.

We tried to disable the mod_proxy (used for the WS), we still have the issue.

We isolated the problem (we undo all changes except lighttpd version) and we still have this behavior.

One more information : it seems to happends randomely.

WebService configuration :

server.modules += ( "mod_cgi" )
server.breakagelog = "/var/log/cgi.log"

$HTTP["url"] == "/cgi-bin/ast_web_service" {
cgi.assign = ( "" => "" )
}

cgi.assign = ( ".cgi" => "" )

Any tips to resolve this behaviro is welcome.


Replies (18)

RE: Lighttpd - Regression since .46 - Added by gstrauss over 6 years ago

Please include your entire lighttpd config.

There is a fix in lighttpd git master for #2878 if you're using server.stream-response-body = 2

What is the OS and kernel version on which you are running?

There is a fix in lighttpd git on my dev branch for #2883 for embedded systems running very old linux kernels.

Your complete lighttpd config (of the reduced version to reproduce the issue) might be helpful for further troubleshooting.

lighttpd -f /etc/lighttpd/lighttpd.conf -p

RE: Lighttpd - Regression since .46 - Added by adrien over 6 years ago

config {
    var.PID                     = 18873
    var.CWD                     = "/" 
    server.port                 = 80
    server.username             = "http" 
    server.groupname            = "http" 
    server.document-root        = "/var/recas/www" 
    server.errorlog             = "/var/log/lighttpd.log" 
    dir-listing.activate        = "disable" 
    index-file.names            = ("index.php")
    mimetype.assign             = (
        ".html" => "text/html",
        ".txt"  => "text/plain",
        ".css"  => "text/css",
        ".js"   => "application/x-javascript",
        ".jpg"  => "image/jpeg",
        # 5
        ".jpeg" => "image/jpeg",
        ".gif"  => "image/gif",
        ".png"  => "image/png",
        ""      => "application/octet-stream",
        # 9
    )
    auth.require                = (
        "/cgi-bin/" => (
            "method"  => "basic",
            "realm"   => "Password protected area",
            "require" => "valid-user",
            # 3
        ),
    )
    auth.debug                  = 2
    auth.backend                = "plain" 
    auth.backend.plain.userfile = "/var/recas/conf/users" 
    compress.filetype           = ("application/x-javascript", "text/css", "text/html", "text/plain")
    compress.cache-dir          = "/var/cache/lighttpd/compress/" 
    cgi.assign                  = (
        ".cgi" => "",
    )
    server.breakagelog          = "/var/log/cgi.log" 
    fastcgi.server              = (
        "/player" => (
            "ast_web_player_handler" => (
                "socket"          => "/tmp/ast_web_player.socket",
                "check-local"     => "disable",
                "bin-path"        => "/usr/local/bin/ast_web_player",
                "max-procs"       => 1,
                "kill-signal"     => 16,
                # 5
                "bin-environment" => (
                    "LD_LIBRARY_PATH"        => "/usr/local/lib:/usr/local/lib/gstreamer-1.0/",
                    "GST_PLUGIN_PATH"        => "/usr/local/lib/gstreamer-1.0/",
                    "GST_DEBUG_DUMP_DOT_DIR" => "/var/recas/log/dot/",
                    # 3
                ),
                # 6
            ),
        ),
        ".php"    => (
            (
                "bin-path"              => "/usr/bin/php-cgi",
                "socket"                => "/tmp/php.socket",
                "max-procs"             => 4,
                "bin-environment"       => (
                    "PHP_FCGI_CHILDREN"     => "4",
                    "PHP_FCGI_MAX_REQUESTS" => "10000",
                    # 2
                ),
                "bin-copy-environment"  => ("PATH", "SHELL", "USER", "LANG", "LANGUAGE"),
                # 5
                "broken-scriptfilename" => "enable",
                "allow-x-send-file"     => "enable",
                # 7
            ),
        ),
        # 2
    )
    fastcgi.map-extensions      = (
        ".inc" => ".php",
    )
    server.modules              = ("mod_auth", "mod_compress", "mod_cgi", "mod_fastcgi", "mod_proxy")

    $HTTP["url"] == "/cgi-bin/ast_web_service" {
        # block 1
        cgi.assign = (
            "" => "",
        )

    } # end of $HTTP["url"] == "/cgi-bin/ast_web_service" 

    $HTTP["url"] == "/ws/ptz" {
        # block 2
        proxy.server = (
            "" => (
                (
                    "host" => "127.0.0.1",
                    "port" => "7681",
                    # 2
                ),
            ),
        )
        proxy.header = (
            "upgrade" => "enable",
        )

    } # end of $HTTP["url"] == "/ws/ptz" 
}

RE: Lighttpd - Regression since .46 - Added by adrien over 6 years ago

Kernel : 4.11.0+

Compile options :
./configure --without-bzip2 --prefix=/usr --sbindir=/usr/bin/ --with-openssl --without-pcre

I'm going to test with the master and the option server.stream-response-body = 2.

RE: Lighttpd - Regression since .46 - Added by adrien over 6 years ago

What is the svn address for the master ?

RE: Lighttpd - Regression since .46 - Added by gstrauss over 6 years ago

What is the svn address for the master ?

InstallFromSource

If you need bidirectional communication, and not just request/response, then try
Server_stream-request-bodyDetails
Server_stream-response-bodyDetails

server.stream-request-body = 1
server.stream-response-body = 1

or
server.stream-request-body = 2
server.stream-response-body = 2

You might also enable some additional debugging in lighttpd. See DebugVariables to enable additional trace, and, if you can, use 'strace' to see what lighttpd is doing when it "hangs" for you.

RE: Lighttpd - Regression since .46 - Added by adrien over 6 years ago

Thx for the links.

Version tested :
lighttpd/1.4.50-devel-lighttpd-1.4.49-7-g26fb8d3 (ssl) - a light and fast webserver

Configuration

config {
    var.PID                     = 20501
    var.CWD                     = "/" 
    server.port                 = 80
    server.username             = "http" 
    server.groupname            = "http" 
    server.document-root        = "/var/recas/www" 
    server.errorlog             = "/var/log/lighttpd.log" 
    server.dir-listing.activate = "disable" 
    index-file.names            = ("index.php")
    mimetype.assign             = (
        ".html" => "text/html",
        ".txt"  => "text/plain",
        ".css"  => "text/css",
        ".js"   => "application/x-javascript",
        ".jpg"  => "image/jpeg",
        # 5
        ".jpeg" => "image/jpeg",
        ".gif"  => "image/gif",
        ".png"  => "image/png",
        ""      => "application/octet-stream",
        # 9
    )
    auth.require                = (
        "/cgi-bin/" => (
            "method"  => "basic",
            "realm"   => "Password protected area",
            "require" => "valid-user",
            # 3
        ),
    )
    auth.backend                = "plain" 
    auth.backend.plain.userfile = "/var/recas/conf/users" 
    compress.filetype           = ("application/x-javascript", "text/css", "text/html", "text/plain")
    compress.cache-dir          = "/var/cache/lighttpd/compress/" 
    cgi.assign                  = (
        ".cgi" => "",
    )
    server.breakagelog          = "/var/log/cgi.log" 
    fastcgi.server              = (
        "/player" => (
            "ast_web_player_handler" => (
                "socket"          => "/tmp/ast_web_player.socket",
                "check-local"     => "disable",
                "bin-path"        => "/usr/local/bin/ast_web_player",
                "max-procs"       => 1,
                "kill-signal"     => 16,
                # 5
                "bin-environment" => (
                    "LD_LIBRARY_PATH"        => "/usr/local/lib:/usr/local/lib/gstreamer-1.0/",
                    "GST_PLUGIN_PATH"        => "/usr/local/lib/gstreamer-1.0/",
                    "GST_DEBUG_DUMP_DOT_DIR" => "/var/recas/log/dot/",
                    # 3
                ),
                # 6
            ),
        ),
        ".php"    => (
            (
                "bin-path"              => "/usr/bin/php-cgi",
                "socket"                => "/tmp/php.socket",
                "max-procs"             => 4,
                "bin-environment"       => (
                    "PHP_FCGI_CHILDREN"     => "4",
                    "PHP_FCGI_MAX_REQUESTS" => "10000",
                    # 2
                ),
                "bin-copy-environment"  => ("PATH", "SHELL", "USER", "LANG", "LANGUAGE"),
                # 5
                "broken-scriptfilename" => "enable",
                "allow-x-send-file"     => "enable",
                # 7
            ),
        ),
        # 2
    )
    fastcgi.map-extensions      = (
        ".inc" => ".php",
    )
    server.modules              = ("mod_auth", "mod_compress", "mod_cgi", "mod_fastcgi", "mod_proxy")
    server.stream-request-body  = 2
    server.stream-response-body = 2
    debug.log-request-handling  = "enable" 
    debug.log-response-header   = "enable" 
    debug.log-request-header    = "enable" 

    $HTTP["url"] == "/cgi-bin/ast_web_service" {
        # block 1
        cgi.assign = (
            "" => "",
        )

    } # end of $HTTP["url"] == "/cgi-bin/ast_web_service" 

    $HTTP["url"] == "/ws/ptz" {
        # block 2
        proxy.server = (
            "" => (
                (
                    "host" => "127.0.0.1",
                    "port" => "7681",
                    # 2
                ),
            ),
        )
        proxy.header = (
            "upgrade" => "enable",
        )

    } # end of $HTTP["url"] == "/ws/ptz" 
}

Attached the logs file.

--> Extract
2018-05-02 10:18:16: (response.c.122) Response-Header: \nHTTP/1.1 200 OK\r\nTransfer-Encoding: chunked\r\nDate: Wed, 02 May 2018 08:18:16 GMT\r\nServer: lighttpd/1.4.50-devel-l$
2018-05-02 10:18:16: (mod_fastcgi.c.437) FastCGI-stderr: Failed to connect to the soap serviceSoapFault Object
2018-05-02 10:18:16: (mod_fastcgi.c.437) FastCGI-stderr: (
2018-05-02 10:18:16: (mod_fastcgi.c.437) FastCGI-stderr: [message:protected] => looks like we got no XML document

With the lastest commit on the master, I still have my wrong behavior.
NB : we add server.stream-request-body & server.stream-response-body

RE: Lighttpd - Regression since .46 - Added by gstrauss over 6 years ago

The errors look like they are coming back from your FastCGI program, not lighttpd. Are you, perhaps, attempting to have one request start something up via CGI, and then expecting it to persist while you access it via other means? That's not a good recipe. lighttpd will send the CGI a kill TERM signal at the end of the CGI response.

With lighttpd 1.4.46 and later, you can tune how many backends are started and have them scale up and down depending on load, including "min-procs" => 0. Docs_ConfigurationOptions

See Docs_ModWSTunnel for a means to have lighttpd handle the websockets, potentially leaving you with a simple gSOAP backend.

RE: Lighttpd - Regression since .46 - Added by adrien over 6 years ago

Backends:

Cgi -> gSoap
fcgi -> php
fcgi -> ast_web_player (video stream applications)
proxy -> libwebsockets

The point is that we call the gsoap service through php (which is the SoapClient FCGI) and, we print the error from here.

The php script call the cgi backend (the soap server) to process the request in C code.

Web request -> php soapClient (fcgi) -> soapServer (cgi)

I confirm that your program worked well until 1.4.46.

RE: Lighttpd - Regression since .46 - Added by gstrauss over 6 years ago

To see if my hunch is correct, why don't you catch the TERM signal in your gSOAP CGI program and log it. Then check your breakage log which you have configured to /var/log/cgi.log

Why doesn't your PHP simply execute the gSOAP program from your PHP? Why make a web request to do so? Why not do the processing in the PHP? Do you want your soapServer accessible to the internet?

Your setup sounds quite complex. Please strace your lighttpd process with strace -o /tmp/strace.log -f -tt and reproduce the error.

RE: Lighttpd - Regression since .46 - Added by adrien over 6 years ago

Hey,

I cleaned the scenario.

Client:
> Php script attached : client_script_php file
> log : client logs
Behavior : Call 1000 times the soap service (other device now) and print the request count and the error when it happends

Server:
Remove all backends except cgi with the gsoap implementation
> lighttpd conf = lighttpd_conf_light file
> Partial C source code = server_source
> lighttpd logs
> cgi logs
No particular observation. the Sigterm signal isn't fired on the cgi process, 1000 requests have been serve.

The wrong behavior happends ~10/1000.

For strace, unfortenalty, I can't install it on the embedded device quickly, but maybe there is other way the debug this behavior ?
It is possible to output all the response buffer ?

If you need, I can give you the access to the gsoap service.

RE: Lighttpd - Regression since .46 - Added by gstrauss over 6 years ago

Thank you for breaking this down. Unfortunately, I am heading out of town and won't be able to look at this more until later next week, but I'll try to get back to you then.

RE: Lighttpd - Regression since .46 - Added by gstrauss over 6 years ago

server_partial_source_code does not compile. It is missing includes (which might give a hint as to the libraries needed).

Providing this the way you did gives me a strong indication that you did not test that the code you provided exhibits the problem you are reporting.

Please don't waste my time and yours with mounds of unusable noise. Please provide code that can both compile and reproduce the problem, and provide instructions how to build it.

RE: Lighttpd - Regression since .46 - Added by adrien over 6 years ago

The filename was quite explicit -> it is part of our source code, not a simple and clear test case, but to simply demonstrate that we didn't get the signal and the soap function is quite simple, before digging.

Providing this the way you did gives me a strong indication that you did not test that the code you provided exhibits the problem you are reporting.
-> LOL!

Providing code is quite simple, any gsoap service reproduce the problem if you call it between 10~100 times.

Check tutorials here for GSOAP : https://www.genivia.com/tutorials.html

Attached : A full test case with minimal dependencies and commands for compilation

RE: Lighttpd - Regression since .46 - Added by avij over 6 years ago

Adrien, can you reproduce this problem outside your embedded system? What kind of CPU does your embedded system have and how much memory?

it would be important to be able to reproduce this problem in a more standard setup. You could, for instance, install and run strace easier than on your embedded system. It is also likely that the development system of gstrauss does not match what you have running on your embedded system.

RE: Lighttpd - Regression since .46 - Added by adrien over 6 years ago

I have my test running on a standard system (intel core i5 - 8GB) now, I will be back with the strace log (tomorrow).
On this environment, I reproduce the error 1/5000 requests.

In the meantime, I digged a bit, and saw that the error happends when the server response "Transfert-Encoding: chunked" in write_prepare(connection.c) (all good requests response the data correctly)

I also logged "log-state-handling" without additionnal information to provide.

RE: [Solved] Lighttpd - Regression since .46 - Added by gstrauss over 6 years ago

lighttpd has some historical compatibility behavior for poorly written CGI programs which do not emit a proper header before the body content. lighttpd detects this by checking whether the first line contains a colon (':'). lighttpd uses whatever it has read in the first read() from the pipe. lighttpd does not check for newline since the content might be very long and might not contain any newline, or the content might be very short and not contain any newline.

See http-header-glue.c line 1100

        /* no HTTP headers, but a body (special-case for CGI compat) */
        /* no colon found; does not appear to be HTTP headers */

In your case, stdsoap2.c appears to be sending response headers without buffering, so the C code is inefficiently making syscalls to flush data to the response pipe after each "header", ": ", and "value" in the CGI response headers. If lighttpd reads only "Status" from the pipe because the ": " has not yet been sent, then lighttpd sees a string which does not contain a ':' and therefore does not look like a CGI response header.

If you're interested in performance, you should look into enabling buffering of the response. Better, if you are interested in performance at the cost of some memory (your app running as a persistent FastCGI backend), you should consider configuring your application as a FastCGi application to avoid the overhead of using CGI on every request.

A specific patch to lighttpd to workaround this stdsoap2.c behavior is below, but since CGI response headers can be sent in any order, this patch may or may not be considered for a future lighttpd release:

--- a/src/http-header-glue.c
+++ b/src/http-header-glue.c
@@ -1097,6 +1097,12 @@ handler_t http_response_parse_headers(server *srv, connection *con, http_respons
             last_eol = i;
         }
     } else if (opts->backend == BACKEND_CGI) {
+        /* special-case gSoap stdsoap2.c which inefficiently sends parts of
+         * CGI response headers without any buffering, which might result in
+         * "Status" being received before the subsequent ": " is sent */
+        if (buffer_is_equal_caseless_string(b, CONST_STR_LEN("Status"))) {
+            return HANDLER_GO_ON;
+        }
         /* no HTTP headers, but a body (special-case for CGI compat) */
         /* no colon found; does not appear to be HTTP headers */
         if (0 != http_chunk_append_buffer(srv, con, b)) {

RE: [Solved] Lighttpd - Regression since .46 - Added by gstrauss over 6 years ago

Instead, I'll add this to better handle backends which inefficiently write only the label of the first response header line, separately from the first ':'.

--- a/src/http-header-glue.c
+++ b/src/http-header-glue.c
@@ -1096,6 +1096,7 @@ handler_t http_response_parse_headers(server *srv, connection *con, http_respons

             last_eol = i;
         }
+    } else if (i == header_len) { /* (no newline yet; partial header line?) */
     } else if (opts->backend == BACKEND_CGI) {
         /* no HTTP headers, but a body (special-case for CGI compat) */
         /* no colon found; does not appear to be HTTP headers */

RE: [Solved] Lighttpd - Regression since .46 - Added by adrien over 6 years ago

Thx for the patch, it's working well.

    (1-18/18)