Project

General

Profile

Bug #2784

huge amount of "SSL: -1 5 0 Success" messages when compiling against openssl 1.1.0

Added by idfnet 10 months ago. Updated 15 days ago.

Status:
Invalid
Priority:
Normal
Assignee:
-
Category:
TLS
Target version:
Start date:
2017-01-17
Due date:
% Done:

0%

Estimated time:
Missing in 1.5.x:

Description

Description:

when compiling lighttpd-1.4.45 (not tested with other versions) against openssl-1.1.0c i get a huge amount of

2017-01-17 10:08:21: (connections-glue.c.166) SSL: -1 5 0 Success 
2017-01-17 10:08:24: (connections-glue.c.166) SSL: -1 5 0 Success 
2017-01-17 10:08:25: (connections-glue.c.166) SSL: -1 5 0 Success 

messages in lighttpds error log file.
Clients can connect without problems. No issues, just lots of log messages.

When compiling lighttpd-1.4.45 against openssl-1.0.2j, no "Success" messages are written to the error log, just the usual "real" error messages, indicating client connection problems.

steps to reproduce:

compile against openssl 1.1.0

Associated revisions

Revision 06d10885 (diff)
Added by gstrauss 15 days ago

[mod_openssl] quiet trace from TCP probes (#2784)

x-ref:
"huge amount of "SSL: -1 5 0 Success" messages"
https://redmine.lighttpd.net/issues/2784

History

#1

Updated by gstrauss 10 months ago

  • Subject changed from huge ammount of "SSL: -1 5 0 Success" messages when compiling against openssl 1.1.0 to huge amount of "SSL: -1 5 0 Success" messages when compiling against openssl 1.1.0

Thanks for the report.

I will attempt to look further, but this appears to be an error in openssl 1.1.0c. Would you try against an earlier pre-release version of openssl 1.1.0 (e.g. 1.1.0a or 1.1.0b) to see if the error occurs there, too? A number of compatibility issues have been reported over the past few months with lighttpd and openssl 1.1.0 (and the issues have been addressed), but this is the first time someone is reporting this spew to the error log.

lighttpd calls ERR_clear_error() to ensure that openssl error queue is empty, then calls SSL_read(). If an error is returned (-1), lighttpd saves the current value of errno and calls SSL_get_error(). The return value you are reporting from SSL_get_error() is SSL_ERROR_SYSCALL (5), which means that the error should be in errno, but you are seeing errno == 0 (Success). For SSL_ERROR_SYSCALL, lighttpd also loops and logs all results from openssl ERR_get_error(), but you haven't reported anything additional in your error logs that begins with "SSL: ..."

Clients can connect without problems. No issues, just lots of log messages.

That would seem to point to SSL_read() failing at or near the end of handling the request. Maybe openssl should be returning 0 for that SSL_read() instead of -1. I am just guessing at the moment.

Does this spew occur for all TLS requests? (i.e. is it reliably reproducible with lighttpd 1.4.45 and openssl 1.1.0c ?)
Is it specific to certain client browsers?

#2

Updated by idfnet 10 months ago

Would you try against an earlier pre-release version of openssl 1.1.0 (e.g. 1.1.0a or 1.1.0b) to see if the error occurs there, too?

I just tested this:
There are no log messages with openssl-1.1.0a and openssl-1.1.0b. Problem starts with openssl-1.1.0c.

[...] but you haven't reported anything additional in your error logs that begins with "SSL: ..."

Just to clarify: there are "real" error messages in the log file like

(connections-glue.c.166) SSL: -1 5 104 Connection reset by peer
(connections-glue.c.200) SSL: 1 error:1417A0C1:SSL routines:tls_post_process_client_hello:no shared cipher

was not mentioned because this was expected, and seems to have no relation to the connections that gets logged with "SSL: -1 5 0 Success"

That would seem to point to SSL_read() failing at or near the end of handling the request. Maybe openssl should be returning 0 for that SSL_read() instead of -1. I am just guessing at the moment.
Does this spew occur for all TLS requests? (i.e. is it reliably reproducible with lighttpd 1.4.45 and openssl 1.1.0c ?)
Is it specific to certain client browsers?

Seems to happen with all clients when the connection is closed (tested with curl, wget, firefox, chrome).
Reused keep-alive connections generates the log entry only once when connection gets closed.

#3

Updated by gstrauss 10 months ago

Thank you very much for that extra info. That points to a regression in openssl 1.1.0 between 1.1.0b and 1.1.0c.
Only 121 commits between openssl 1.1.0b and 1.1.0c. I might have some time tomorrow to bisect.

(As an aside, I had asked for the "SSL:" log entries since when SSL_read() fails, the error might be in errno, or it might be in the openssl error queue, which lighttpd prints to the error log, and then prints SSL: -1 5 ... with the errno info. Since errno was 0, I was wondering if there was any related trace from the openssl error queue, but apparently there is not.)

#4

Updated by gstrauss 10 months ago

Hmmm. I was not able to reproduce the error with OpenSSL 1.1.0c on a quick test with curl -k and a self-signed certificate.

Would you share your lighttpd.conf for SSL (ssl.* settings)? (change the names for privacy and do not include passwords)

Anything notable about your certificates that would be different from a self-signed certificate? Do you have wildcard certs, or lots of domain aliases in the cert?

When you are building, are you on a system that has ccache installed? For this testing, you might disable the use of the ccache, e.g. by specifying the compiler on the command line: make clean; make CC=/usr/bin/gcc CC_FOR_BUILD=/usr/bin/gcc ... If you do a clean build and skip any compiler object cache, do you still see the problem with OpenSSL 1.1.0c?

#5

Updated by idfnet 10 months ago

Would you share your lighttpd.conf for SSL (ssl.* settings)? (change the names for privacy and do not include passwords)

ssl.engine        = "enable" 
ssl.pemfile        = "/PATH/TO/PEMFILE" 
ssl.ca-file        = "/PATH/TO/CAFILE" 
ssl.use-sslv3        = "disable" 
ssl.honor-cipher-order  = "enable" 
ssl.cipher-list     = "ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-SHA256:AES256-GCM-SHA384:AES256-SHA256:AES128-GCM-SHA256:AES128-SHA256:HIGH:!MD5:!aNULL:!EDH" 

Anything notable about your certificates that would be different from a self-signed certificate? Do you have wildcard certs, or lots of domain aliases in the cert?

it's an official signed wildcard-certificate "domain.tld", "*.domain.tld"

When you are building, are you on a system that has ccache installed? For this testing, you might disable the use of the ccache, e.g. by specifying the compiler on the command line: make clean; make CC=/usr/bin/gcc CC_FOR_BUILD=/usr/bin/gcc ... If you do a clean build and skip any compiler object cache, do you still see the problem with OpenSSL 1.1.0c?

The build that showed this behaviour was build from a clean source dir with no ccache installed.

#6

Updated by gstrauss 10 months ago

Do you have anything between the client and the server (such as a hardware load balancer) which might affect the SSL connection?

In my read through the source code differences between OpenSSL 1.1.0b and OpenSSL 1.1.0c, I identified a possible culprit as

commit 122580ef71e4e5f355a1a104c9bfb36feee43759
Author: Matt Caswell <matt@openssl.org>
Date:   Fri Oct 21 13:25:19 2016 +0100

    A zero return from BIO_read()/BIO_write() could be retryable

    A zero return from BIO_read()/BIO_write() could mean that an IO operation
    is retryable. A zero return from SSL_read()/SSL_write() means that the
    connection has been closed down (either cleanly or not). Therefore we
    should not propagate a zero return value from BIO_read()/BIO_write() back
    up the stack to SSL_read()/SSL_write(). This could result in a retryable
    failure being treated as fatal.

    Reviewed-by: Richard Levitte <levitte@openssl.org>
    (cherry picked from commit 4880672a9b41a09a0984b55e219f02a2de7ab75e)

On github, that is
https://github.com/openssl/openssl/commit/122580ef71e4e5f355a1a104c9bfb36feee43759
No notes there, but there is quite a discussion on the original commit
https://github.com/openssl/openssl/commit/4880672a9b41a09a0984b55e219f02a2de7ab75e
and in
https://github.com/openssl/openssl/issues/1903

I am reading through the github issue 1903, but it appears that what you are seeing may be an unclean openssl shutdown.
If that is the case, lighttpd should not bother to log this, and I am at the moment undecided if it should even be logged when debug.log-ssl-noise = "enable" in lighttpd.conf. I am leaning against the extra noise since it sounds like it is happening in all or nearly all TLS connections in your environment.

The following will almost assuredly make the error go away for you, but I have more reading to do before considering it the right solution for both openssl 1.1.0 and for earlier openssl versions (and also for libressl).

--- a/src/connections-glue.c
+++ b/src/connections-glue.c
@@ -155,14 +155,13 @@ static int connection_handle_read_ssl(server *srv, connection *con) {
                         *   (for socket I/O on Unix systems, consult errno for details).
                         *
                         */
-                       while((ssl_err = ERR_get_error())) {
-                               /* get all errors from the error-queue */
-                               log_error_write(srv, __FILE__, __LINE__, "sds", "SSL:",
-                                               r, ERR_error_string(ssl_err, NULL));
-                       }
-
-                       switch(oerrno) {
-                       default:
+                       if ((ssl_err = ERR_get_error())) {
+                               do {
+                                       /* get all errors from the error-queue */
+                                       log_error_write(srv, __FILE__, __LINE__, "sds", "SSL:",
+                                                       r, ERR_error_string(ssl_err, NULL));
+                               } while((ssl_err = ERR_get_error()));
+                       } else if (oerrno) {
                                log_error_write(srv, __FILE__, __LINE__, "sddds", "SSL:",
                                                len, r, oerrno,
                                                strerror(oerrno));
#7

Updated by gstrauss 10 months ago

  • Status changed from New to Invalid

https://bugs.python.org/issue28689
https://github.com/openssl/openssl/issues/1919 (closed as dup of github issue 1903)
https://github.com/openssl/openssl/commit/beacb0f0c1ae7b0542fe053b95307f515b578eb7

Make SSL_read and SSL_write return the old behaviour and document it.

This reverts commit 4880672.

From the bugs.python.org page, it looks like this behavior change in openssl 1.1.0c was reverted and is fixed in the branch that will be openssl 1.1.0d

#8

Updated by ondrejj 17 days ago

Looks like this bug is still or again present:

2017-11-05 19:19:26: (mod_openssl.c.1369) SSL: -1 5 0 Success
2017-11-05 19:19:26: (mod_openssl.c.1369) SSL: -1 5 0 Success

Versions:

lighttpd-1.4.47-1.fc26.x86_64
openssl-1.1.0f-7.fc26.x86_64

On my server more than 1 GB data in error.log after a week. Most of messages are same.
After some statistics looks like aprox. 10 Success messages for every normal request on server. :-(

#9

Updated by gstrauss 17 days ago

Looks like this bug is still or again present:

Looks to me that you should be able to provide more context and data. If you downgrade lighttpd, does the problem go away? Or if you downgrade openssl, does the problem go away? Have you tried a different system? Have others reported this?

Hint: you said on IRC that this problem has been occurring for you for at least 4 weeks. lighttpd 1.4.47 was released only two weeks ago, so evidence currently points to something else on your system.

#10

Updated by ondrejj 16 days ago

Looks like this message is not displayed always, but it's displayed for empty requests. Try to connecto to port 443 using telnet or netcat and then disconnect. For example:

telnet localhost 443
^D

or

echo | nc localhost 443

Looks like there is lot's of empty connects from various IP's to my server and I can't filter them, because there are no requests from them.

#11

Updated by gstrauss 15 days ago

This is not an issue with lighttpd code and probably something you should report on openssl forums. The following patch will quiet the trace from lighttpd, but you still probably want to look into what is probing your network so frequently. Is it a TCP connection probe health check from a load balancer?

--- a/src/mod_openssl.c
+++ b/src/mod_openssl.c
@@ -1451,6 +1451,14 @@ connection_read_cq_ssl (server *srv, connection *con,

             switch(oerrno) {
             default:
+                /* (oerrno should be something like ECONNABORTED not 0
+                 *  if client disconnected before anything was sent
+                 *  (e.g. TCP connection probe), but it does not appear
+                 *  that openssl provides such notification, not even
+                 *  something like SSL_R_SSL_HANDSHAKE_FAILURE) */
+                if (0==oerrno && 0==cq->bytes_in && !hctx->conf.ssl_log_noise)
+                    break;
+
                 log_error_write(srv, __FILE__, __LINE__, "sddds", "SSL:",
                         len, r, oerrno,
                         strerror(oerrno));

#12

Updated by ondrejj 15 days ago

Thanks. Will this patch be applied to lighttpd stable?

I was already checked, what causes this on my server, but looks like it's comming from different IP addresses from internet. No my local addresses. Lighttpd is providing content as official Fedora and Clamav mirror. May be one of them has an problem, which causes this, just it's very hard to find what causes this, because there is no information in these packets.

Also available in: Atom