Bug #2405

Crash parsing configuration: *** glibc detected *** /usr/sbin/lighttpd: malloc(): memory corruption (fast): 0x0a0354d8 ***

Added by DavidAnderson over 2 years ago. Updated over 2 years ago.

Status:FixedStart date:2012-04-06
Priority:HighDue date:
Assignee:-% Done:

100%

Category:core
Target version:-
Missing in 1.5.x:No

Description

lighttpd version: 1.4.30 from CentOS 5 EPEL testing (http://dl.fedoraproject.org/pub/epel/testing/5/i386/lighttpd-1.4.30-1.el5.i386.rpm).

Same problem with 1.4.28 from EPEL current.

I have a working configuration file. Adding the following fragment (domain name changed) causes it to fail:

  1. mumia.example.org
    $HTTP["host"] =~ "^(www.)?mumia.example.org$" {
    server.document-root = "/var/www/webusers/example.org/htdocs/mumia"
    }

It does not appear to be related to the fragment; if I keep that fragment but delete something else, then it works again.

  1. /usr/sbin/lighttpd -t -f lighttpd.conf.faulty
    • glibc detected * /usr/sbin/lighttpd: malloc(): memory corruption (fast): 0x092d8490 * ======= Backtrace: =========
      /lib/libc.so.6[0x326da6]
      /lib/libc.so.6(_libc_calloc+0xba)[0x327cda]
      /usr/sbin/lighttpd(array_init+0x1a)[0x806421a]
      /usr/sbin/lighttpd(data_config_init+0x3a)[0x806635a]
      /usr/sbin/lighttpd(configparser+0x10ec)[0x805a6ac]
      /usr/sbin/lighttpd[0x8056a08]
      /usr/sbin/lighttpd(config_parse_file+0xd1)[0x8057881]
      /usr/sbin/lighttpd(config_read+0x186)[0x8057a66]
      /usr/sbin/lighttpd(main+0x40b)[0x804e22b]
      /lib/libc.so.6(
      _libc_start_main+0xdc)[0x2d1eac]
      /usr/sbin/lighttpd[0x804d901] ======= Memory map: ========
      00110000-00154000 r-xp 00000000 08:03 22405861 /lib/libssl.so.0.9.8e
      00154000-00158000 rw-p 00043000 08:03 22405861 /lib/libssl.so.0.9.8e
      00158000-0015a000 r-xp 00000000 08:03 21594313 /lib/libcom_err.so.2.1
      0015a000-0015b000 rw-p 00001000 08:03 21594313 /lib/libcom_err.so.2.1
      0016d000-0017f000 r-xp 00000000 08:03 21594370 /lib/libz.so.1.2.3
      0017f000-00180000 rw-p 00011000 08:03 21594370 /lib/libz.so.1.2.3
      0024f000-00252000 r-xp 00000000 08:03 76918968 /lib/libdl-2.5.so
      00252000-00253000 r--p 00002000 08:03 76918968 /lib/libdl-2.5.so
      00253000-00254000 rw-p 00003000 08:03 76918968 /lib/libdl-2.5.so
      002b5000-002bb000 r-xp 00000000 08:03 121473680 /usr/lib/libfam.so.0.0.0
      002bb000-002bc000 rw-p 00006000 08:03 121473680 /usr/lib/libfam.so.0.0.0
      002bc000-0040f000 r-xp 00000000 08:03 76918962 /lib/libc-2.5.so
      0040f000-00411000 r--p 00153000 08:03 76918962 /lib/libc-2.5.so
      00411000-00412000 rw-p 00155000 08:03 76918962 /lib/libc-2.5.so
      00412000-00415000 rw-p 00412000 00:00 0
      00532000-0056d000 r-xp 00000000 08:03 21594361 /lib/libsepol.so.1
      0056d000-0056e000 rw-p 0003b000 08:03 21594361 /lib/libsepol.so.1
      0056e000-00578000 rw-p 0056e000 00:00 0
      005e3000-005e5000 r-xp 00000000 08:03 21594340 /lib/libkeyutils-1.2.so
      005e5000-005e6000 rw-p 00001000 08:03 21594340 /lib/libkeyutils-1.2.so
      0071b000-00736000 r-xp 00000000 08:03 76917950 /lib/ld-2.5.so
      00736000-00737000 r--p 0001a000 08:03 76917950 /lib/ld-2.5.so
      00737000-00738000 rw-p 0001b000 08:03 76917950 /lib/ld-2.5.so
      00884000-0089a000 r-xp 00000000 08:03 21594359 /lib/libselinux.so.1
      0089a000-0089c000 rw-p 00015000 08:03 21594359 /lib/libselinux.so.1
      008b5000-008e1000 r-xp 00000000 08:03 80183409 /usr/lib/libgssapi_krb5.so.2.2
      008e1000-008e2000 rw-p 0002c000 08:03 80183409 /usr/lib/libgssapi_krb5.so.2.2
      00909000-00914000 r-xp 00000000 08:03 22405589 /lib/libgcc_s-4.1.2-20080825.so.1
      00914000-00915000 rw-p 0000a000 08:03 22405589 /lib/libgcc_s-4.1.2-20080825.so.1
      00a7d000-00a7e000 r-xp 00a7d000 00:00 0 [vdso]
      00b3d000-00b4d000 r-xp 00000000 08:03 76919005 /lib/libresolv-2.5.so
      00b4d000-00b4e000 r--p 0000f000 08:03 76919005 /lib/libresolv-2.5.so
      00b4e000-00b4f000 rw-p 00010000 08:03 76919005 /lib/libresolv-2.5.so
      00b4f000-00b51000 rw-p 00b4f000 00:00 0
      00bd7000-00bf6000 r-xp 00000000 08:03 21594354 /lib/libpcre.so.0.0.1
      00bf6000-00bf7000 rw-p 0001e000 08:03 21594354 /lib/libpcre.so.0.0.1
      00c1f000-00cb3000 r-xp 00000000 08:03 80183423 /usr/lib/libkrb5.so.3.3
      00cb3000-00cb6000 rw-p 00093000 08:03 80183423 /usr/lib/libkrb5.so.3.3
      00d9c000-00dc2000 r-xp 00000000 08:03 80183413 /usr/lib/libk5crypto.so.3.1
      00dc2000-00dc3000 rw-p 00025000 08:03 80183413 /usr/lib/libk5crypto.so.3.1
      00e21000-00f4b000 r-xp 00000000 08:03 22405859 /lib/libcrypto.so.0.9.8e
      00f4b000-00f5f000 rw-p 00129000 08:03 22405859 /lib/libcrypto.so.0.9.8e
      00f5f000-00f62000 rw-p 00f5f000 00:00 0
      00f7a000-00f82000 r-xp 00000000 08:03 80183425 /usr/lib/libkrb5support.so.0.1
      00f82000-00f83000 rw-p 00007000 08:03 80183425 /usr/lib/libkrb5support.so.0.1
      08048000-08074000 r-xp 00000000 08:03 80186672 /usr/sbin/lighttpd
      08074000-08075000 rw-p 0002c000 08:03 80186672 /usr/sbin/lighttpd
      09150000-092dc000 rw-p 09150000 00:00 0 [heap]
      b7d00000-b7d21000 rw-p b7d00000 00:00 0
      b7d21000-b7e00000 ---p b7d21000 00:00 0
      b7eaa000-b7ef7000 r--s 00000000 08:03 76919039 /etc/lighttpd/lighttpd.conf.faulty
      b7ef7000-b7efc000 rw-p b7ef7000 00:00 0
      bf939000-bf94e000 rw-p bffe8000 00:00 0 [stack]
      Aborted

I can email you the full configuration files off-line if needed. (The server has a few hundred sites on it, all working well).

The configuration file is 305K so perhaps it is related to the size - however I believe it used to be twice as big (I did some work to reduce it a few months ago).

I changed my glibc version to one from 2 months ago (from the CentOS repositories), but both versions gave the same crash.

valgrind.txt Magnifier - Valgrind -v --leak-check=yes output (142 KB) DavidAnderson, 2012-04-07 08:54

valgrind-withsymbols.txt Magnifier (140 KB) DavidAnderson, 2012-04-07 17:06

Associated revisions

Revision 2828
Added by stbuehler over 2 years ago

buffer_caseless_compare: always convert letters to lowercase to get transitive results, fixing array lookups (fixes #2405)

Revision 2829
Added by stbuehler over 2 years ago

buffer_caseless_compare: always convert letters to lowercase to get transitive results, fixing array lookups (fixes #2405)

History

#1 Updated by nitrox over 2 years ago

  • Priority changed from Urgent to Normal

"Urgent" and up is reserved for critical bugs.

#2 Updated by DavidAnderson over 2 years ago

OK, thanks for the info. For future ref, What counts as critical? lighttpd not being able to parse a valid config file and hence dying before running was pretty critical for me!

#3 Updated by DavidAnderson over 2 years ago

I've done some further testing.

It's not the size of the file directly - I managed to cut 20k (305kb -> 284kb) off by factoring some common elements. No difference in the outcome, except now sometimes I just get a segfault instead of glibc detecting the problem and aborting.

Even an empty fragment triggers the problem, e.g.
$HTTP["host"] =~ "^(www.)?mumia.example.org$" {
}
i.e. With the empty fragment, lighttpd blows up. Without it, it doesn't.

It seems to be related to the number of fragments; if I remove one then all is OK. I seem to have approx. 750 $HTTP["host"] =~ sections in the file. Not all fragments seem equal however - some I can remove without ending the segfaulting - but some when removed stop the segfaulting. There is no clear pattern I can see; they are all regexes of the same kind of format.

#4 Updated by DavidAnderson over 2 years ago

I think I've done as much testing as I can now - I also discovered that:

  • I could remove about 40% of the $HTTP["host"] sections and still get the crash. But I cannot work out what the pattern is in which ones. As said before, I can remove just 1 section and avoid the crash, if I remove the right 1.
  • I copied the config over to another machine to test it there; same results.

#5 Updated by DavidAnderson over 2 years ago

I found the difference between crashing and not crashing....

I can take away 200 sections and it still crashes; but if I remove or even rewrite just one section that matches a certain pattern then the crashing stops.

If I add in ONE more section like:

$HTTP["host"] =~ "^(www.)?.example.org$" {

... then it crashes.

However if I add in:

$HTTP["host"] =~ "^www.example.org$" {
or even this one, which of course is equivalent to the one that crashes (so that's my work-around for now):
$HTTP["host"] =~ "^www.example.org$|^example.org$" {

... then no crash.

In an effort to get a simpler reproducible test case, I wrote a shell script that outputs 5000 sections like this:

$HTTP["host"] =~ "^(www.)?1.example.com$|^frubble1.zuggy.example.com$" { # Nothing to see here.
}

$HTTP["host"] =~ "^(www.)?2.example.com$|^frubble2.zuggy.example.com$" { # Nothing to see here.
}

etc.

However, that didn't produce a crash, even when added to the end of my base config (i.e. my config before the config for any individual website - before all my $HTTP["host"] =~ sections). So, the heap corruption is more subtle than just the number of such sections. Too subtle for someone like me who knows no C.

Anything else I can do to help, or does that give you enough data to work on?

#6 Updated by DavidAnderson over 2 years ago

Nope, it's more temperamental still.

Changing one such section fixed things.

But when I changed my code to output all sections in that new style, it was back to crashing.

Now in the new style, I find that changing just one section from:

$HTTP["host"] =~ "^www.example.org" {

to:

$HTTP["host"] =~ "www.example.org" {

is the difference between crashing and not crashing. (the difference is the ^ , in case you didn't spot it).

#7 Updated by stbuehler over 2 years ago

You could run it with valgrind; that might show the source of the memory corruption.

#8 Updated by DavidAnderson over 2 years ago

Thank you... I've attached the output of testing a known-bad config file (that produces the glibc abort):

valgrind -v --leak-check=yes lighttpd -t -f lighttpd.conf.faulty

Finishes with:
27998 ERROR SUMMARY: 642 errors from 67 contexts (suppressed: 43 from 8)

#9 Updated by DavidAnderson over 2 years ago

For comparison, the presently running config file (one that doesn't cause a crash) ends with:

28573 ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 43 from 8)

#10 Updated by stbuehler over 2 years ago

Hm, strange.

  • It would be helpful if you can get debug symbols for lighttpd (compile with -g, perhaps there is a separate package for them).
  • Do you/CentOS apply patches we don't have upstream?
  • Perhaps you could share a "faulty" config; i guess it is triggered by the conditionals, so you probably can remove all the options. (If you don't want it public you can mail it to or )
  • Are you using "global { ... }" blocks?

#11 Updated by DavidAnderson over 2 years ago

Thank you...

1) I installed the debuginfo package. A new valgrind output is attached.

This is for 1.4.28 from EPEL current. (EPEL = Extra Packages for Enterprise Linux - packages from the Fedora community for RHEL 5 and all clones, e.g. CentOS). As I say, I got the same problem in 1.4.30 from EPEL testing (but I had other problems with that one, which I reported to bugzilla.redhat.com, and then down-graded back). So I assume the issue is present the same in both.

2) I downloaded the EPEL source package. There was only one patch, which was to enable mod_geoip.

3) I'll email you the config file that produced this output. Please let me know if you don't get it immediately after I post this. It's 300kb, though I got the same problems on ones less than 200kb (it used to be over 500k until I did some refactoring a few months ago - with more refactoring yesterday whilst testing I got it below 200kb).

4) No, I wasn't aware of the existence of global { } until I was googling yesterday to look into this problem and found it mentioned in a different bug.

#12 Updated by DavidAnderson over 2 years ago

One other piece of info - the bug seems widely reproducible. With the same config file you get the same error on Cent OS 5 32 bit, Cent OS 6 64 bit and Fedora 16 64 bit every time.

#13 Updated by stbuehler over 2 years ago

  • Priority changed from Normal to High

ok, i found the problem: buffer_caseless_compare isn't transitive:

(gdb) p buffer_caseless_compare("B", 1, "_", 1)
$1 = -29
(gdb) p buffer_caseless_compare("_", 1, "a", 1)
$2 = -2
(gdb) p buffer_caseless_compare("B", 1, "a", 1)
$3 = 1

So "B" < "_" and "_" < "a", but "B" > "a". buffer_caseless_compare is used to build binary trees to store the conditional blocks (the keys are the "paths" of conditions to a block).

In some cases the parser won't detect if you used a condition twice, and tries to insert it a second time; in the first tree this doesn't matter much, it just inserts a second entry with the same key (it didn't find the already existing entry!); but a second array detects the duplicate entry and frees the first, leading to dangling references and memory corruptions. (the funny part: the second tree isn't actually used)

These trees (called "array" in the code...) are used for other things too (like http headers). As one entry is usually only used in one array there shouldn't be any dangling references, but i didn't the other usages yet.

#14 Updated by stbuehler over 2 years ago

  • Status changed from New to Fixed
  • % Done changed from 0 to 100

Applied in changeset r2828.

#15 Updated by DavidAnderson over 2 years ago

Thanks very much. Works for me now (I recompiled 1.4.28 current from EPEL with that extra patch).

#16 Updated by stbuehler over 2 years ago

Nice!

Thank you very much for the bug report and all the details :)

Also available in: Atom