Project

General

Profile

Actions

Bug #3056

closed

OCSP Stapling reload seems not to work

Added by flynn 7 months ago. Updated 6 months ago.

Status:
Fixed
Priority:
Normal
Category:
TLS
Target version:
ASK QUESTIONS IN Forums:
No

Description

From my observation the periodicylly reload of the OCSP stapling does not work, I always have to restart lighttpd to make it work again:
  • I generate an ocsp file
  • I use the option ssl.stapling-file
  • I check once a day to verify the state of the OCSP answer - it works
  • I check and regenerate the OCSP file once a day
  • if the valid duration of the OCSP file has ended, lighttpd does not send an OCSP-response anymore, although a new valid OCSP has been generated some hours ago
  • I do not see any related error messages

I t seems, that lighttpd respects the "Next Update:" in the current OCSP, but does not load the new one.

After a restart of lighttpd all is fine, check is valid.

After one week the same happens again ...

Actions #1

Updated by gstrauss 7 months ago

I will try to reproduce this. Which TLS module are you using? I presume mod_openssl

In the code, loading the ssl.stapling-file is a subroutine that is called at startup and the same code is called to reload. I need to try to find why the reloading might not occur.

Do you get an error in the lighttpd error log that says "certificate marked OCSP Must-Staple, but OCSP response expired from ssl.stapling-file" ?

Actions #2

Updated by flynn 7 months ago

I use the TLS module mod_openssl.

I do NOT see any ocsp related log messages in the error log.

Actions #3

Updated by gstrauss 7 months ago

In my tests, lighttpd walks the list of loaded certificates every 64 seconds, as expected. Any certificate with associated ssl.stapling-file is checked against the saved Next Update timestamp. I verified that the Next Update timestamp is properly parsed (in mod_openssl). If the Next Update field is not present or could not be parsed, lighttpd reloads ssl.stapling-file after 1 hour. lighttpd does not attempt to reload ssl.stapling-file unless Next Update timestamp will expire in <= 256 seconds.

flynn when you update the ssl.stapling-file, do you update the timestamp on the file? lighttpd does not reload a file if the timestamp on the file is <= the timestamp for when lighttpd last loaded ssl.stapling-file

Actions #4

Updated by gstrauss 7 months ago

(BTW, you would not get "certificate marked OCSP Must-Staple, but OCSP response expired from ssl.stapling-file" unless the certificate is marked OCSP Must-Staple and an updated ssl.stapling-file was not available or failed to be loaded)

Actions #5

Updated by flynn 7 months ago

The ssl.stapling-file has an updated timstamp which corresponds to the date of the cron job.

I ran 4 minutes strace, I cannot see, that my lighttps walks though the certificates every 64 seconds??
I cannot see any of the certificate filenames (access, openat, stat, ...) in the strace log.

Actions #6

Updated by gstrauss 7 months ago

lighttpd reads the "Next Update" timestamp when the ssl.stapling-file is initially read.

You won't see any system calls in strace for refreshing ssl.stapling-file until <= 256 seconds before "Next Update" expires.

(I can be convinced to change this to check the filesystem for an updated ssl.stapling-file at least daily if you think it beneficial)

Actions #7

Updated by gstrauss 7 months ago

I ran 4 minutes strace, I cannot see, that my lighttps walks though the certificates every 64 seconds??

lighttpd checks its in-memory list of certificates every 64 seconds.

TRIGGER_FUNC(mod_openssl_handle_trigger) {
    const plugin_data * const p = p_d;
    const time_t cur_ts = log_epoch_secs;
    if (cur_ts & 0x3f) return HANDLER_GO_ON; /*(continue once each 64 sec)*/

Actions #8

Updated by gstrauss 7 months ago

Check your ssl.stapling-file with
openssl ocsp -respin /path/to/staple.der -resp_text -noverify
and look for "Next Update"

Actions #9

Updated by flynn 7 months ago

Looks good:

Cert Status: good
This Update: Jan 12 11:00:00 2021 GMT
Next Update: Jan 19 11:00:00 2021 GMT

The same as in the response of openssl s_client test.
What else can I do to debug this?

Actions #10

Updated by gstrauss 7 months ago

If you like, we can try to narrow down what is not working correctly by patching some code.

With the patch below, you should see lighttpd stat() the ssl.stapling-file every 64 seconds with strace, and you should see it reloaded if the timestamp on ssl.stapling-file has changed.

--- a/src/mod_openssl.c
+++ b/src/mod_openssl.c
@@ -1528,8 +1528,8 @@ mod_openssl_reload_stapling_file (server *srv, plugin_cert *pc, const time_t cur
 static int
 mod_openssl_refresh_stapling_file (server *srv, plugin_cert *pc, const time_t cur_ts)
 {
-    if (pc->ssl_stapling && pc->ssl_stapling_nextts - 256 > cur_ts)
-        return 1; /* skip check for refresh unless close to expire */
+//    if (pc->ssl_stapling && pc->ssl_stapling_nextts - 256 > cur_ts)
+//        return 1; /* skip check for refresh unless close to expire */
     struct stat st;
     if (0 != stat(pc->ssl_stapling_file->ptr, &st)
         || st.st_mtime <= pc->ssl_stapling_loadts) {

Actions #11

Updated by flynn 7 months ago

With this patch I see periodic stat calls in strace and openat of the stapling file after a change of the timestamp with touch.

Actions #12

Updated by gstrauss 7 months ago

The stat() once every 64-seconds is fairly trivial, so you're welcome to leave that patch in place and see if the new file is loaded after Next Update: Jan 19 11:00:00 2021 GMT (the "Next Update" timestamp you posted above)

As I posted above, I traced and tested individual pieces of the workflow and I have not been able to reproduce the issue.

If the patch you are using works after Jan 19, then maybe there is an issue somewhere with pc->ssl_stapling_nextts and I might consider an unconditional stat() check of ssl.stapling-file once an hour or so.

Actions #13

Updated by flynn 7 months ago

Should I add a log message to monitor pc->ssl_stapling_nextts in the error log?

Actions #14

Updated by gstrauss 7 months ago

Should I add a log message to monitor pc->ssl_stapling_nextts in the error log?

You may. That's quite safe, especially if only done once every 64 seconds.

For my own troubleshooting: are you running 64-bit lighttpd executable or 32-bit? In either case, time_t should be a signed integral type.

Actions #15

Updated by flynn 7 months ago

I'm running 64-bit lighttpd (amd64).

I run now with patches (no condition, logging pc->ssl_stapling_nextts) until the end of Next Update on Jan 19.

I report again either if pc->ssl_stapling_nextts changes ahead of time or on Jan 19.

Actions #16

Updated by gstrauss 7 months ago

Ok. Thanks.

To pretty-print the pc->ssl_stapling_nextts:

date --date="@1611327599"

(fill in your timestamp)

Actions #17

Updated by flynn 7 months ago

After an update of the ssl.stapling-file by the cron-job pc->ssl_stapling_nextts has been updated too. The OCSP-response with openssl s_client -status ... shows an updated Next Update: which macthes the new value of pc->ssl_stapling_nextts. This means that the the reload of the ssl.stapling-file works, the problem is maybe in the condition. I reeenable the if-statement and add an log message for the condition to debug this ...

Actions #18

Updated by flynn 7 months ago

One question: every 64sec I expect one log message per certificate, but I get 5 log messages for two certificates (3 for the first certifcate, 2 for the second). Is this normal/intended?

Actions #19

Updated by gstrauss 7 months ago

One question: every 64sec I expect one log message per certificate, but I get 5 log messages for two certificates (3 for the first certifcate, 2 for the second). Is this normal/intended?

Do you have the two certificates listed multiple times in the config? e.g. one for IPv4 and one for IPv6 addresses? lighttpd loads the certificates into its list each time it sees the ssl.pemfile directive.

If certificates are listed in the global scope, then any $SERVER["socket"] containing ssl.engine = "enable" but no other ssl.* directives inherits the ssl.* directives from the global scope.For SNI, each certificate is associated with the SNI. ($HTTP["host"]). If repeated certificate loads will be a scaling issue, I can look into deduplicating them across the entire config. Right now, like most other directives, they are handled in-scope, as they occur.

This means that the the reload of the ssl.stapling-file works, the problem is maybe in the condition.

Possibly, but it does not jump out at me by staring at those two lines of code. However, the bug has to be somewhere since you have been able to reproduce it in the past.

Actions #20

Updated by flynn 7 months ago

Yes I have listed the certificates multiple times, one for IPv4 und one for IPv6.
I have changed that (global loading of certificates, $SERVER["socket"] contains only ssl.engine = "enable") and now I have only two log messages for two certifcates.
Let's wait if this solves the OCSP reload problem ...

Actions #21

Updated by gstrauss 6 months ago

Anything interesting in your logs now that the original stapling file has expired and a new one should have been read?

Actions #22

Updated by flynn 6 months ago

No yet, next "Next Update" is on Jan, 24th. I did not manage to create ocsp responses with less than 7 days.

Actions #23

Updated by gstrauss 6 months ago

After spending quite a bit of time trying to reproduce this, I think that the stapling file refresh code is pretty solid.

However, I did find a bug in walking the config, and that is what resulted in some of the repeated processing that you saw with your original config. The first ssl.stapling-file might be checked for refresh twice in some circumstances.

I haven't yet been able to come up with a config where this might cause, say, the last ssl.stapling-file to be skipped and not checked for refresh, but maybe that is what was happening with your original config. With your original config, are some ssl.stapling-file directives checked for refresh, and others not checked? If you strace (or add logging), you should see a refresh check for each and every ssl.stapling-file directive, even if the same stapling file path is repeated in different conditions.

Here's a patch for mod_openssl. A similar patch needs to be applied to the other TLS modules and to mod_auth and mod_vhostdb cache cleanup. This bug affects only the ssl.stapling-file refresh, and mod_auth and mod_vhostdb cache cleanups (all new in lighttpd 1.4.56)

--- a/src/mod_openssl.c
+++ b/src/mod_openssl.c
@@ -1555,7 +1555,9 @@ mod_openssl_refresh_stapling_files (server *srv, const plugin_data *p, const tim
 {
     /* future: might construct array of (plugin_cert *) at startup
      *         to avoid the need to search for them here */
-    for (int i = 0, used = p->nconfig; i < used; ++i) {
+    /* (init i to 0 if global context; to 1 to skip empty global context) */
+    if (NULL == p->cvlist) return;
+    for (int i = !p->cvlist[0].v.u2[1], used = p->nconfig; i < used; ++i) {
         const config_plugin_value_t *cpv = p->cvlist + p->cvlist[i].v.u2[0];
         for (; cpv->k_id != -1; ++cpv) {
             if (cpv->k_id != 0) continue; /* k_id == 0 for ssl.pemfile */

Actions #24

Updated by flynn 6 months ago

The update of ssl.stapling-file succeeded now, about 5min before the end of "Next update" I see a log message reading the new file, a test with openssl s_client ... shows a valid OCSP response with a new "Next Update" 6 days in future.

Loading certificates only once in global scope solves for me this issue, maybe a hint in the documentation should be added.

Actions #25

Updated by gstrauss 6 months ago

Thanks for the update.

I think the issue had to do with somehow incorrectly walking the configuration, not that the ssl.stapling-file and ssl.pemfile must be in the global config. After all, multiple values for those directives are supported in $SERVER["socket"] and $HTTP["host"]

Actions #26

Updated by gstrauss 6 months ago

  • Status changed from New to Fixed
Actions #27

Updated by gstrauss 6 months ago

While I have not been able to reproduce the issue with OCSP stapling not reloading, I did fix the issue with config walking and also made some minor adjustments to code just in case time_t is unsigned.

I am marking this as fixed, but if this is still an issue for someone, please re-open.

Actions

Also available in: Atom