Project

General

Profile

Bug #2778

larger memory usage for file uploads via SSL on embedded system

Added by ste 9 months ago. Updated 9 months ago.

Status:
Fixed
Priority:
Normal
Assignee:
-
Category:
core
Target version:
Start date:
2017-01-09
Due date:
% Done:

100%

Estimated time:
Missing in 1.5.x:

Description

After upgrading lighttpd from 1.4.39 to 1.4.44 we can no longer upload larger files via SSL.

This is an embedded system with only some MB of free memory.

Uploading a 3MB file fails with

(mod_cgi.c.1213) fork failed: Cannot allocate memory

I tracked the regression down to the following commit:

commit a95aaa9de984dc004dd1d4302147e8a0c23efb10
Author: Glenn Strauss <gstrauss@gluelogic.com>
Date:   Wed Jul 27 05:54:25 2016 -0400

    [TLS] read all available records from SSL_read()

    read all available records from SSL_read(), even if larger than
    MAX_READ_LIMIT, since the data is already in memory.  openssl is
    configured with SSL_MODE_RELEASE_BUFFERS and will release openssl
    buffers once records have been read.

    Without reading available data, there was a chance that the connection
    would hang waiting for a read event on the fd, even though all the
    data had already been read from kernel socket buffers and was in openssl
    memory waiting to be read with SSL_read().

    (thx glen and avij)

In older versions at most MAX_READ_LIMIT bytes were read into the internal buffer before flushing them into tempfiles in server.upload-dirs.

In the commit above the MAX_READ_LIMIT was removed. Now several MB are buffered before they are written to temp files. This behaviour exhausts the memory on our small system.

This issue only affects SSL connections, unencrypted connections are correctly limited by MAX_READ_LIMIT.

Currently we are using the attached patch as a workaround. It basically reverts the commit above and parts of two other commits.

ssl-read-limit.patch (1.34 KB) ssl-read-limit.patch ste, 2017-01-09 14:52
ssl.read-ahead.patch (3.56 KB) ssl.read-ahead.patch gstrauss, 2017-01-10 13:24

Associated revisions

Revision b03c4962 (diff)
Added by gstrauss 9 months ago

[TLS] ssl.read-ahead = "disable" for low mem (fixes #2778)

new directive ssl.read-ahead = "enable"/"disable" to control
SSL_CTX_set_read_ahead(). Default "enable". The "disable" setting
is intended for use on low memory systems with a slow CPU which is
unable to keep up with decryption of large request bodies.

x-ref:
"larger memory usage for file uploads via SSL on embedded system"
https://redmine.lighttpd.net/issues/2778

History

#1 Updated by gstrauss 9 months ago

  • Subject changed from high memory usage for file uploads via SSL to larger memory usage for file uploads via SSL on embedded system

Thanks for the report.

The purpose of continuing to read everything from openssl was that, theoretically, openssl had already read the current data set into memory, or the kernel already had read the current data set into kernel socket buffers, so the memory was theoretically already in use. If openssl was then exploding the data to a larger size (e.g. if data had been compressed with zlib on the transport) this could impact the memory use.

Now, if the data had been read into openssl buffers, there was a chance that lighttpd might not get a signal for ready data on the socket (since the ready data was in openssl buffer, not kernel socket buffers), and there could be a large delay or timeout before such pending data was processed. Since your patch adds the connection to the joblist, that should handle that scenario.

I'll take a closer look to see if there might be a better way to get the data into temporary files sooner, rather than looping reading into memory chunks in con->read_queue.

#2 Updated by gstrauss 9 months ago

Since reading request expects mem chunks, appending to the joblist might be reasonable.
At the same time, items are added to the joblist once, and so there might still be the condition I mentioned above. (server.c could be modified to set con->in_joblist = 0 before re-processing the connection, but that could theoretically end up with an ever-growing joblist as items kept getting appended to the list before it could complete walking the list)

Is your embedded processor handling openssl encryption/decryption more slowly than incoming network data? That could lead to the larger memory usage. A possible solution might be to reduce the size of the kernel socket buffers. You might test with lower net.core.rmem_default and net.core.rmem_max, net.ipv4.tcp_mem, and net.ipv4.tcp_rmem in /etc/sysctl.conf. (Set in /proc, too, and restart lighttpd to take effect immediately, or else reboot after changing /etc/sysctl.conf)

#3 Updated by gstrauss 9 months ago

Note that lighttpd sets SSL_CTX_set_default_read_ahead() to a non-zero value to enable read-ahead. This might not be the best value for low-memory systems. (At the moment, this in not configurable in lighttpd.) It is tempting to use SSL_pending() to try to limit the amount of data read, while avoiding waiting for a socket even when data has already been read. However, there are bugs and limitations in openssl. As reported by 'man SSL_pending' on Linux:

BUGS
       SSL_pending() takes into account only bytes from the TLS/SSL record
       that is currently being processed (if any).  If the SSL object's
       read_ahead flag is set (see SSL_CTX_set_read_ahead(3)), additional
       protocol bytes may have been read containing more TLS/SSL records;
       these are ignored by SSL_pending().

       Up to OpenSSL 0.9.6, SSL_pending() does not check if the record type of
       pending data is application data.

There were reasons that a95aaa9d was made, and while reverting it may work in your scenario, it is likely to break other usage scenarios. MAX_READ_LIMIT is 256k. How large are your socket buffers that this limit is being reached? Is the situation that there is a client on the local network sending data on the wire faster than the embedded system CPU can decrypt it?

Configuring lighttpd to disable openssl read-ahead and then testing SSL_pending(con->ssl) > 0 might be a better solution. Would you be willing to test that? If the below works for you, I'll add a configuration option for this behavior, which can be enabled by you and others on very low-memory systems with slow CPUs.

--- a/src/connections-glue.c
+++ b/src/connections-glue.c
@@ -127,7 +127,7 @@ static int connection_handle_read_ssl(server *srv, connection *con) {
                        connection_set_state(srv, con, CON_STATE_ERROR);
                        return -1;
                }
-       } while (len > 0);
+       } while (len > 0 && SSL_pending(con->ssl) > 0);

        if (len < 0) {
                int oerrno = errno;
diff --git a/src/network.c b/src/network.c
index b46dcf7..c7aa96a 100644
--- a/src/network.c
+++ b/src/network.c
@@ -994,7 +994,7 @@ int network_init(server *srv) {
                                        s->ssl_pemfile);
                        return -1;
                }
-               SSL_CTX_set_default_read_ahead(s->ssl_ctx, 1);
+               SSL_CTX_set_default_read_ahead(s->ssl_ctx, 0);
                SSL_CTX_set_mode(s->ssl_ctx,  SSL_CTX_get_mode(s->ssl_ctx)
                                            | SSL_MODE_ENABLE_PARTIAL_WRITE
                                            | SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER

Another option might be to check for MAX_READ_LIMIT and then add a test for pending data on the socket. If there is still pending data on the socket, then return what we have, since another fdevent will wake us up again to do more reading. However, if there is no pending data on the socket, then keep reading the buffered data from openssl, since we won't be woken up by another fdevent. (I would like to avoid adding joblist_append() here, as joblist_append() would then need some improvements, too, to handle edge cases that were not previously handled well.)

#4 Updated by ste 9 months ago

Yes, the sending of data is definitely much faster than the CPU can decrypt on our system.

I can also reproduce the problem on a desktop computer. If I send a 100MB file via localhost, the memory usage sometimes goes up to 20MB and more (and also stays high). With the patch, it stays around 6MB.
Of course this is unlikely to be a big issue on modern systems.

The solution using SSL_pending() sounds good. Initially I wanted to fix it using SSL_pending(), but refrained from doing so because of the BUGs section. As you disabled read_ahead, it should be fine.

The patch seems good so far if I add the check for len. The unmodified patch breaks uploads.

diff --git a/src/connections-glue.c b/src/connections-glue.c
index d30c229..5c15c4b 100644
--- a/src/connections-glue.c
+++ b/src/connections-glue.c
@@ -127,7 +127,7 @@ static int connection_handle_read_ssl(server *srv, connection *con) {
             connection_set_state(srv, con, CON_STATE_ERROR);
             return -1;
         }
-    } while (len > 0);
+    } while (len > 0 && SSL_pending(con->ssl) > 0);

     if (len < 0) {
         int oerrno = errno;
@@ -206,12 +206,14 @@ static int connection_handle_read_ssl(server *srv, connection *con) {
         connection_set_state(srv, con, CON_STATE_ERROR);

         return -1;
-    } else { /*(len == 0)*/
+    } else if (len == 0) {
         con->is_readable = 0;
         /* the other end close the connection -> KEEP-ALIVE */

         return -2;
     }
+
+    return 0;
 #else
     UNUSED(srv);
     UNUSED(con);

#5 Updated by gstrauss 9 months ago

Thanks for noting the additional check for (len == 0). It got lost as I was trying different things.

I'll look into create a new config directive to disable read-ahead, and to check SSL_pending() (as in the patch above) when openssl read-ahead is disabled.

#6 Updated by gstrauss 9 months ago

Attached is a patch for new directive ssl.read-ahead = "enable" (default), or "disable". Please test with ssl.read-ahead = "disable" to confirm this meets your needs. Thank you.

#7 Updated by ste 9 months ago

Works as expected with ssl.read-ahead = "disable". Thank you!

Will report back if we run into any issues.

#8 Updated by gstrauss 9 months ago

  • Status changed from Patch Pending to Fixed
  • % Done changed from 0 to 100

#9 Updated by ste 9 months ago

I've retested with 1.4.45. Either there is a new problem or my previous tests were incomplete.

In src/connections-glue.c line 130 "con->conf.ssl_read_ahead" evaluates to 1 instead of 0.

The config (ssl.read-ahead = "disable") seems to be parsed correctly, because in other places the value is correct.

#10 Updated by gstrauss 9 months ago

That sounds like a problem in how your configuration is parsed. Would you share your (entire) lighttpd.conf, replacing sensitive info with xxxxx?
/usr/bin/lighttpd -f /etc/lighttpd.conf -p
will print the parsed config, including all include files.

#11 Updated by ste 9 months ago

$ sudo  ./build/lighttpd -D -f /etc/lighttpd/lighttpd.conf  -p
config {
    var.PID                 = 25364
    var.CWD                 = "/home/<XXX>/lighttpd1.4" 
    server.modules          = ("mod_cgi", "mod_redirect", "mod_auth", "mod_accesslog")
    server.max-connections  = 8
    server.max-request-size = 4096
    server.username         = "www-data" 
    server.groupname        = "www-data" 
    server.document-root    = "/home/httpd" 
    cgi.assign              = (
        ".cgi" => "",
    )

    $SERVER["socket"] == ":443" {
        # block 1
        ssl.engine      = "enable" 
        ssl.pemfile     = "/etc/server.pem" 
        ssl.cipher-list = "aRSA+HIGH !3DES +kEDH +kRSA !kSRP !kPSK" 
        ssl.read-ahead  = "disable" 
        ssl.use-sslv3   = "disable" 

    } # end of $SERVER["socket"] == ":443" 
}

#12 Updated by gstrauss 9 months ago

The default value of ssl.read-ahead is 1.

In src/connections-glue.c line 130 "con->conf.ssl_read_ahead" evaluates to 1 instead of 0.

Are you seeing that for connections to port 443? Or on connections to port 80 (the default port)? You did not specify server.bind or server.port, so lighttpd listens on "*:80" by default.

The info you provided suggests you were troubleshooting with a debugger. What behavior were you seeing? Same as in the original post?

#13 Updated by ste 9 months ago

I added two debug statements:

diff --git a/src/connections-glue.c b/src/connections-glue.c
index 80104b6..e42be37 100644
--- a/src/connections-glue.c
+++ b/src/connections-glue.c
@@ -106,6 +106,8 @@ static int connection_handle_read_ssl(server *srv, connection *con) {

        if (!con->srv_socket->is_ssl) return -1;

+       fprintf(stderr, "con->conf.ssl_read_ahead=%i\n", con->conf.ssl_read_ahead);
+
        ERR_clear_error();
        do {
                chunkqueue_get_memory(con->read_queue, &mem, &mem_len, 0, SSL_pending(con->ssl));
diff --git a/src/network.c b/src/network.c
index 4295fe9..3a908f3 100644
--- a/src/network.c
+++ b/src/network.c
@@ -994,6 +994,7 @@ int network_init(server *srv) {
                                        s->ssl_pemfile);
                        return -1;
                }
+               fprintf(stderr, "s->ssl_read_ahead: %i\n", s->ssl_read_ahead);
                SSL_CTX_set_default_read_ahead(s->ssl_ctx, s->ssl_read_ahead);
                SSL_CTX_set_mode(s->ssl_ctx,  SSL_CTX_get_mode(s->ssl_ctx)
                                            | SSL_MODE_ENABLE_PARTIAL_WRITE

Here's what happens:

$ sudo  ./build/lighttpd -D -f /etc/lighttpd/lighttpd.conf  
s->ssl_read_ahead: 0
2017-01-31 15:59:18: (/home/<XXX>lighttpd1.4/src/log.c.217) server started

# wget https://127.0.0.1/foo.cgi" is executed at this point

con->conf.ssl_read_ahead=1
con->conf.ssl_read_ahead=1
con->conf.ssl_read_ahead=1
con->conf.ssl_read_ahead=1
con->conf.ssl_read_ahead=1

edit: Yes, behavior is like described in original post because SSL_pending() is not evaluated.

#14 Updated by gstrauss 9 months ago

I'll try to reproduce using your config.

In the meantime, if you put ssl.read-ahead = "disable" in the global scope, then things should behave as you expect.

#15 Updated by ste 9 months ago

gstrauss wrote:

I'll try to reproduce using your config.

In the meantime, if you put ssl.read-ahead = "disable" in the global scope, then things should behave as you expect.

Thanks, using global scope works for now.

#16 Updated by gstrauss 9 months ago

Some instrumentation shows that in lighttpd 1.4.45, ssl.read-ahead = "disable" will only work properly when set in the global scope. This is because the information is needed early in the request processing, before the request headers are read. Config settings get reset between requests, and ssl.read-ahead must be persistent connection-level state. Now then, if you're on a low memory system, the expected use is to enable this globally, rather than repeating it in one or more $SERVER["socket"] blocks.

As an aside, in my dev branch for lighttpd 1.4.46, this is not an issue because lighttpd 1.4.46 introduces connection-level state that is persistent for the entire connection, and the new mod_openssl handles ssl.read-ahead at the point where connection is accept()ed, preserving the value for all requests on the connection.

#17 Updated by ste 9 months ago

You're right: Setting it in the global scope is probably better anyhow. The main reason we were setting all SSL options within the $SERVER["socket"] block is that most sample configs for dual HTTP/HTTPS setups also do it.

Even better if it will work either way in 1.4.46.

Thanks again for the fast response.

Also available in: Atom