Project

General

Profile

Bug #2940

mod_authn_ldap/mod_cgi race condition, "Can't contact LDAP server"

Added by bjornfor 2 months ago. Updated 2 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
mod_auth
Target version:
Start date:
2019-03-11
Due date:
% Done:

0%

Estimated time:
Missing in 1.5.x:

Description

There seems to be a race condition bug somewhere between mod_authn_ldap and mod_cgi which manifests itself in

(mod_authn_ldap.c.449) ldap: ldap_sasl_bind_s(): Can't contact LDAP server

messages from lighttpd. This happens when lighttpd gets hit with multiple parallel requests on URLs requiring LDAP auth and being served by CGI scripts.

I noticed this problem when doing git clone from a cgit instance served by lighttpd. The clone operation would fail on a seemingly random object, with HTTP 401 Unauthorized error.

The problem is reproducible with lighttpd v1.4.45, 1.4.51 and master branch (commit 9232145024ae "[core] poll: fdarray uses fd as index, not fde_ndx"). These are all the versions I've tested.

What I've done:

  • Online search: "Can't contact LDAP server mod_cgi mod_authn_ldap lighttpd". Doesn't seem relevant.
  • Searched this bug tracker: found issue about "mod_auth caching" (implementing this might perhaps workaround this issue).

After that I set out to reproduce and narrow down the bug. I created the following scenarios and have gotten reliable results:

git clone http://server/static-no-auth/repo1.git # success
git clone http://server/static-auth/repo1.git # success
git clone http://server/cgi-no-auth/repo1.git # success
git clone http://server/cgi-auth/repo1.git # failure

repo1.git is a dummy repo with 100 commits. With too few commits (say, 10), there is a pretty good chance of the clone completing. I've never seen the clone succeed with 100 commits.

Alternatively, instead of git clone, running a bunch of curl's in parallel will also trigger the bug.

Even though I've spent many hours on this, I've been unable to write a proper patch. My best "fix" so far is to add a 100ms delay before lighttpd calls ldap_sasl_bind_s(). I've looked at openldap/slapd and lighttpd log files, run under gdb etc.

I set up a git repo for reproducing this bug, it can be found here: https://github.com/bjornfor/lighttpd-auth-ldap-issue.

History

#1

Updated by gstrauss 2 months ago

Thank you for taking the time to try to narrow this down.

lighttpd is single-threaded and mod_authn_ldap is blocking. It is also independent of mod_cgi. However, mod_authn_ldap holds open the connection to the ldap server for reuse. My first thought is that ldap is not setting FD_CLOEXEC on its connection fd, or lighttpd should be doing something additional when mod_cgi calls fork().

#2

Updated by gstrauss 2 months ago

  • Category set to mod_auth

Also available in: Atom