Bug #1554
closedlighttpd [1.4.18] RSTs TCP connections if mysql dies, and ceases to work
Description
FreeBSD 7.0-RC1 amd64
lighttpd-1.4.18_1
My MySQL server is crashing regularly with a bus error. While it restarts on it's own, all my webservers running lighttpd will cease to work properly.
A tcpdump analysis shows that lighty is just RSTing connections. FastCGI (php) doesn't work, /server-status doesn't work, and getting a bogus file (in the hopes of getting a 404) doesn't work either (so it isn't just fastcgi going down).
I do not have a ktrace and/or packet capture of this chain of events at hand, but will gladly provide one the next time this happens.
p.s: This is happening on a moderately loaded site (2500 queries/sec to mysql), the volume of connections/s to mysql must be a factor in lighty going nuts while mysql doesn't come back.
p.p.s: In very rare ocasions, one or two webservers in the farm will still be functioning after mysql crashing, but the majority just RST connections. I believe this is a mere coincidence, but here's the info anyway.
-- hugo
Updated by admin about 17 years ago
In what way is MySQL being used? Just by PHP? Or also by Lighttpd?
Did you try strace to see what exactly Lighttpd is doing?
Updated by Anonymous about 17 years ago
Replying to Olaf van der Spek:
In what way is MySQL being used? Just by PHP? Or also by Lighttpd?
Did you try strace to see what exactly Lighttpd is doing?
It's being used only by PHP.
Have not tried ktracing the process yet, when I first saw it happening though, I tried a tcpdump, and all it was doing was RSTing connections attempts (http GET).
-- hugo
Updated by Anonymous about 17 years ago
18057 lighttpd RET kevent 0 18057 lighttpd CALL gettimeofday(0x7fffffffe910,0) 18057 lighttpd RET gettimeofday 0 18057 lighttpd CALL kevent(0x5,0,0,0x800a5c000,0x7f7,0x7fffffffe910) 18057 lighttpd GIO fd 5 wrote 0 bytes "" 18057 lighttpd GIO fd 5 read 0 bytes "" 18057 lighttpd RET kevent 0 18057 lighttpd CALL gettimeofday(0x7fffffffe910,0) 18057 lighttpd RET gettimeofday 0 18057 lighttpd CALL kevent(0x5,0,0,0x800a5c000,0x7f7,0x7fffffffe910) 18057 lighttpd GIO fd 5 wrote 0 bytes "" 18057 lighttpd GIO fd 5 read 0 bytes "" 18057 lighttpd RET kevent 0 18057 lighttpd CALL gettimeofday(0x7fffffffe910,0) 18057 lighttpd RET gettimeofday 0 18057 lighttpd CALL kevent(0x5,0,0,0x800a5c000,0x7f7,0x7fffffffe910) 18057 lighttpd GIO fd 5 wrote 0 bytes "" 18057 lighttpd GIO fd 5 read 0 bytes "" 18057 lighttpd RET kevent 0 18057 lighttpd CALL gettimeofday(0x7fffffffe910,0) 18057 lighttpd RET gettimeofday 0 18057 lighttpd CALL kevent(0x5,0,0,0x800a5c000,0x7f7,0x7fffffffe910) 18057 lighttpd GIO fd 5 wrote 0 bytes "" 18057 lighttpd GIO fd 5 read 0 bytes "" 18057 lighttpd RET kevent 0 18057 lighttpd CALL gettimeofday(0x7fffffffe910,0) 18057 lighttpd RET gettimeofday 0 18057 lighttpd CALL kevent(0x5,0,0,0x800a5c000,0x7f7,0x7fffffffe910) 18057 lighttpd GIO fd 5 wrote 0 bytes "" 18057 lighttpd GIO fd 5 read 0 bytes "" 18057 lighttpd RET kevent 0 18057 lighttpd CALL gettimeofday(0x7fffffffe910,0) 18057 lighttpd RET gettimeofday 0 18057 lighttpd CALL kevent(0x5,0,0,0x800a5c000,0x7f7,0x7fffffffe910) 18057 lighttpd GIO fd 5 wrote 0 bytes "" 18057 lighttpd GIO fd 5 read 0 bytes "" 18057 lighttpd RET kevent 0 18057 lighttpd CALL gettimeofday(0x7fffffffe910,0) 18057 lighttpd RET gettimeofday 0 18057 lighttpd CALL kevent(0x5,0,0,0x800a5c000,0x7f7,0x7fffffffe910) 18057 lighttpd GIO fd 5 wrote 0 bytes "" 18057 lighttpd GIO fd 5 read 0 bytes "" 18057 lighttpd RET kevent 0 18057 lighttpd CALL gettimeofday(0x7fffffffe910,0) 18057 lighttpd RET gettimeofday 0 18057 lighttpd CALL kevent(0x5,0,0,0x800a5c000,0x7f7,0x7fffffffe910) 18057 lighttpd GIO fd 5 wrote 0 bytes "" 18057 lighttpd GIO fd 5 read 0 bytes "" 18057 lighttpd RET kevent 0 18057 lighttpd CALL gettimeofday(0x7fffffffe910,0) 18057 lighttpd RET gettimeofday 0 18057 lighttpd CALL kevent(0x5,0,0,0x800a5c000,0x7f7,0x7fffffffe910) 18057 lighttpd GIO fd 5 wrote 0 bytes "" 18057 lighttpd GIO fd 5 read 0 bytes "" 18057 lighttpd RET kevent 0 18057 lighttpd CALL gettimeofday(0x7fffffffe910,0) 18057 lighttpd RET gettimeofday 0 18057 lighttpd CALL kevent(0x5,0,0,0x800a5c000,0x7f7,0x7fffffffe910) 18057 lighttpd GIO fd 5 wrote 0 bytes "" 18057 lighttpd GIO fd 5 read 0 bytes "" 18057 lighttpd RET kevent 0 18057 lighttpd CALL gettimeofday(0x7fffffffe910,0) 18057 lighttpd RET gettimeofday 0 18057 lighttpd CALL kevent(0x5,0,0,0x800a5c000,0x7f7,0x7fffffffe910) 18057 lighttpd GIO fd 5 wrote 0 bytes "" 18057 lighttpd GIO fd 5 read 0 bytes "" 18057 lighttpd RET kevent 0 18057 lighttpd CALL gettimeofday(0x7fffffffe910,0) 18057 lighttpd RET gettimeofday 0 18057 lighttpd CALL kevent(0x5,0,0,0x800a5c000,0x7f7,0x7fffffffe910) 18057 lighttpd GIO fd 5 wrote 0 bytes "" 18057 lighttpd GIO fd 5 read 0 bytes "" 18057 lighttpd RET kevent 0 18057 lighttpd CALL gettimeofday(0x7fffffffe910,0) 18057 lighttpd RET gettimeofday 0 18057 lighttpd CALL kevent(0x5,0,0,0x800a5c000,0x7f7,0x7fffffffe910) 18057 lighttpd GIO fd 5 wrote 0 bytes "" 18057 lighttpd GIO fd 5 read 0 bytes "" 18057 lighttpd RET kevent 0 18057 lighttpd CALL gettimeofday(0x7fffffffe910,0) 18057 lighttpd RET gettimeofday 0 18057 lighttpd CALL kevent(0x5,0,0,0x800a5c000,0x7f7,0x7fffffffe910) 18057 lighttpd GIO fd 5 wrote 0 bytes "" 18057 lighttpd GIO fd 5 read 0 bytes "" 18057 lighttpd RET kevent 0 18057 lighttpd CALL gettimeofday(0x7fffffffe910,0) 18057 lighttpd RET gettimeofday 0 18057 lighttpd CALL kevent(0x5,0,0,0x800a5c000,0x7f7,0x7fffffffe910) 18057 lighttpd GIO fd 5 wrote 0 bytes "" 18057 lighttpd GIO fd 5 read 0 bytes "" 18057 lighttpd RET kevent 0 18057 lighttpd CALL gettimeofday(0x7fffffffe910,0) 18057 lighttpd RET gettimeofday 0 18057 lighttpd CALL kevent(0x5,0,0,0x800a5c000,0x7f7,0x7fffffffe910) 18057 lighttpd GIO fd 5 wrote 0 bytes "" 18057 lighttpd GIO fd 5 read 0 bytes "" 18057 lighttpd RET kevent 0 18057 lighttpd CALL gettimeofday(0x7fffffffe910,0) 18057 lighttpd RET gettimeofday 0 18057 lighttpd CALL kevent(0x5,0,0,0x800a5c000,0x7f7,0x7fffffffe910)
Probably not very helpful.
tcpdump:
2008-02-11 16:55:02.415577 IP CLIENT_IP.58233 > SERVER_IP.80: S 3849757907:3849757907(0) win 65535 <mss 1368,nop,wscale 3,sackOK,timestamp 3773677020 0> 2008-02-11 16:55:02.415595 IP SERVER_IP.80 > CLIENT_IP.58233: S 2656883143:2656883143(0) ack 3849757908 win 32768 <mss 1368,nop,wscale 5,sackOK,timestamp 4021466769 3773677020> 2008-02-11 16:55:02.667184 IP CLIENT_IP.58233 > SERVER_IP.80: . ack 1 win 8305 <nop,nop,timestamp 3773677222 4021466769> 2008-02-11 16:55:02.667193 IP SERVER_IP.80 > CLIENT_IP.58233: R 2656883144:2656883144(0) win 0 2008-02-11 16:55:02.667310 IP CLIENT_IP.58233 > SERVER_IP.80: P 1:440(439) ack 1 win 8305 <nop,nop,timestamp 3773677223 4021466769> 2008-02-11 16:55:02.667320 IP SERVER_IP.80 > CLIENT_IP.58233: R 2656883144:2656883144(0) win 0 2008-02-11 16:55:02.826970 IP CLIENT_IP.64680 > SERVER_IP.80: S 17300054:17300054(0) win 65535 <mss 1368,nop,wscale 3,sackOK,timestamp 398629228 0> 2008-02-11 16:55:02.826984 IP SERVER_IP.80 > CLIENT_IP.64680: S 3087984456:3087984456(0) ack 17300055 win 32768 <mss 1368,nop,wscale 5,sackOK,timestamp 3260113607 398629228> 2008-02-11 16:55:03.038226 IP CLIENT_IP.64680 > SERVER_IP.80: . ack 1 win 8305 <nop,nop,timestamp 398629431 3260113607> 2008-02-11 16:55:03.038234 IP SERVER_IP.80 > CLIENT_IP.64680: R 3087984457:3087984457(0) win 0 2008-02-11 16:55:03.038351 IP CLIENT_IP.64680 > SERVER_IP.80: P 1:440(439) ack 1 win 8305 <nop,nop,timestamp 398629432 3260113607> 2008-02-11 16:55:03.038361 IP SERVER_IP.80 > CLIENT_IP.64680: R 3087984457:3087984457(0) win 0 2008-02-11 16:55:03.257478 IP CLIENT_IP.55757 > SERVER_IP.80: S 1877491071:1877491071(0) win 65535 <mss 1368,nop,wscale 3,sackOK,timestamp 2248952447 0> 2008-02-11 16:55:03.257493 IP SERVER_IP.80 > CLIENT_IP.55757: S 2728777152:2728777152(0) ack 1877491072 win 32768 <mss 1368,nop,wscale 5,sackOK,timestamp 2127381702 2248952447> 2008-02-11 16:55:03.439875 IP CLIENT_IP.55757 > SERVER_IP.80: . ack 1 win 8305 <nop,nop,timestamp 2248952652 2127381702> 2008-02-11 16:55:03.439884 IP SERVER_IP.80 > CLIENT_IP.55757: R 2728777153:2728777153(0) win 0 2008-02-11 16:55:03.440125 IP CLIENT_IP.55757 > SERVER_IP.80: P 1:440(439) ack 1 win 8305 <nop,nop,timestamp 2248952652 2127381702> 2008-02-11 16:55:03.440135 IP SERVER_IP.80 > CLIENT_IP.55757: R 2728777153:2728777153(0) win 0 2008-02-11 16:55:03.653505 IP CLIENT_IP.54479 > SERVER_IP.80: S 4200448260:4200448260(0) win 65535 <mss 1368,nop,wscale 3,sackOK,timestamp 2713617084 0> 2008-02-11 16:55:03.653520 IP SERVER_IP.80 > CLIENT_IP.54479: S 763249863:763249863(0) ack 4200448261 win 32768 <mss 1368,nop,wscale 5,sackOK,timestamp 4066123162 2713617084> 2008-02-11 16:55:03.874008 IP CLIENT_IP.54479 > SERVER_IP.80: . ack 1 win 8305 <nop,nop,timestamp 2713617287 4066123162> 2008-02-11 16:55:03.874017 IP SERVER_IP.80 > CLIENT_IP.54479: R 763249864:763249864(0) win 0 2008-02-11 16:55:03.874019 IP CLIENT_IP.54479 > SERVER_IP.80: P 1:440(439) ack 1 win 8305 <nop,nop,timestamp 2713617288 4066123162> 2008-02-11 16:55:03.874028 IP SERVER_IP.80 > CLIENT_IP.54479: R 763249864:763249864(0) win 0 2008-02-11 16:55:04.055654 IP CLIENT_IP.58271 > SERVER_IP.80: S 1226742027:1226742027(0) win 65535 <mss 1368,nop,wscale 3,sackOK,timestamp 3588978442 0> 2008-02-11 16:55:04.055667 IP SERVER_IP.80 > CLIENT_IP.58271: S 3588049867:3588049867(0) ack 1226742028 win 32768 <mss 1368,nop,wscale 5,sackOK,timestamp 4196075183 3588978442> 2008-02-11 16:55:04.282277 IP CLIENT_IP.58271 > SERVER_IP.80: . ack 1 win 8305 <nop,nop,timestamp 3588978646 4196075183> 2008-02-11 16:55:04.282285 IP SERVER_IP.80 > CLIENT_IP.58271: R 3588049868:3588049868(0) win 0 2008-02-11 16:55:04.282289 IP CLIENT_IP.58271 > SERVER_IP.80: P 1:440(439) ack 1 win 8305 <nop,nop,timestamp 3588978646 4196075183> 2008-02-11 16:55:04.282297 IP SERVER_IP.80 > CLIENT_IP.58271: R 3588049868:3588049868(0) win 0
As you can see, lighty just RSTs any HTTP request after it gets stuck in this state. Restarting lighty "solves" the problem, but I wonder if there is a more elegant way of dealing with this.
-- hugo
Updated by admin about 17 years ago
Probably not very helpful.
Hehe. I don't see any occurance of accept. So the connect request doesn't even appear to reach Lighttpd. So find out why your kernel is sending a reset.
Updated by stbuehler over 16 years ago
- Status changed from New to Fixed
- Resolution set to invalid
Not a lighty issue.
Also available in: Atom