Bug #2464
closedpatch for intermittent ldap failures
Description
I was encountering intermittent auth failures using ldap. I tracked the issue to LDAP syscalls not recovering from EINTR properly. Setting LDAP_OPT_RESTART allows LDAP to retry the syscall in MOST cases.
Some libdldap syscalls are not EINTR guarded (presently, connect() in ldap_pvt_connect()) and so you can still expect to see intermittent failures from that. Only patching libldap will fix those issues.
Note that my testing platform is an extremely slow processor and is perfect at finding these kinds of conditions.
Files
Updated by codehero almost 12 years ago
OpenLDAP incorporated patch into its master to guard the connect() call against the EINTR signal.
http://www.openldap.org/its/index.cgi/Software%20Bugs?id=7476
In my testing so far (with both patched lighttpd and OpenLDAP) I have not experienced a single connection issue. Please incorporate this patch.
Updated by stbuehler over 11 years ago
- Target version set to 1.4.x
I'm not sure LDAP_OPT_RESTART has no negative side effects. Would it keep restarting in a loop until it gets a response (and therefore blocking all request handling)? Does it fail after a fixed amount of tries?
Updated by codehero over 11 years ago
LDAP_OPT_RESTART is only checked when syscalls fail due to system interference (ie from a signal. If you looked at the openldap patches I linked you would have seen:
+ } while(err == EINTR && + LDAP_BOOL_GET( &ld->ld_options, LDAP_BOOL_RESTART ));
So my system had intermittent auth failures because external CGI processes exited sending SIGLCHLD while openldap's connect() call was taking place.
You don't have an infinite loop without infinite signals interrupting the lighttpd process.
Updated by stbuehler over 11 years ago
The patches might touch only EINTR stuff with LDAP_OPT_RESTART, but given the docs for the option it was not designed for this (although the doc also says "FIXME").
But did you check the complete openldap code, and did you check all versions? Also, does LDAP_OPT_RESTART have a version requirement?
Also infinite loops on EINTR are bad; for example there could be SIGALRM... - what happens if the connect() would timeout, but SIGALRM restarts it every second?
Updated by codehero over 11 years ago
I did not check all the code openldap code for LDAP_OPT_RESTART and given how much of a pile the openldap code is, I won't. I don't know about any version requirements, as I was only interested in solving this issue for my specific application.
I did not consider SIGALRM, but unless the signalling period is less than the time it takes for connect() to complete I do not see the issue. I was using a custom local LDAP server so this was not an issue for me. I would NOT recommend using a remote LDAP server...
FWIW, I have not seen a single lockup with my code yet. I have made my own (very poor) LDAP server and bugs in my server caused lighty to hang up, but this was without the LDAP_OPT_RESTART change. The takeaway here is that using openldap in the first place is playing with fire.
Updated by gstrauss almost 9 years ago
Instead of LDAP_OPT_RESTART, the following patch modifies the flags to sigaction for SIGCHLD to restart those system calls that are restartable, instead of causing those system calls to return EINTR. See 'man 7 signal' for list of restartable system calls.
This patch only works on systems which support sigaction() instead of only supporting signal(), but most modern systems support sigaction().
diff --git a/src/server.c b/src/server.c index 26ecbcc..351503d 100644 --- a/src/server.c +++ b/src/server.c @@ -973,6 +973,7 @@ int main (int argc, char **argv) { sigaction(SIGTERM, &act, NULL); sigaction(SIGHUP, &act, NULL); sigaction(SIGALRM, &act, NULL); + act.sa_flags |= SA_RESTART | SA_NOCLDSTOP; sigaction(SIGCHLD, &act, NULL); #elif defined(HAVE_SIGNAL)
Updated by stbuehler almost 9 years ago
- Target version changed from 1.4.x to 1.4.40
I'll go with the patch from gstrauss (thanks), as I don't want to search for how to check for the presence of LDAP_OPT_RESTART
(which probably would fix the problem too).
Updated by stbuehler almost 9 years ago
- Status changed from Patch Pending to Fixed
- % Done changed from 0 to 100
Applied in changeset r3108.
Updated by gstrauss over 8 years ago
- Related to Bug #2345: LDAP authentication problem added
Updated by gstrauss almost 7 years ago
- Related to Bug #2846: LDAP authentication vs. AD: problems with referrals added
Updated by gstrauss 12 months ago
- Related to Feature #2294: add ldap referrals support added
Also available in: Atom