Bug #1245
closedRepeatable 100% CPU usage due to remote FastCGI app misbehaviour
Description
Hi,
I'm writing a new FastCGI app (without using a FastCGI lib) and I got Lighttpd to eat all my CPU cycles.
accept(4, {sa_family=AF_INET, sin_port=htons(3052), sin_addr=inet_addr("192.168.0.131")}, [16]) = 6 brk(0x80d2000) = 0x80d2000 brk(0x80f3000) = 0x80f3000 fcntl64(6, F_SETFD, FD_CLOEXEC) = 0 fcntl64(6, F_SETFL, O_RDWR|O_NONBLOCK) = 0 ioctl(6, FIONREAD, [444]) = 0 read(6, "GET /xbt/ HTTP/1.1\r\nHost: 192.16"..., 447) = 444 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 7 fcntl64(7, F_SETFD, FD_CLOEXEC) = 0 fcntl64(7, F_SETFL, O_RDWR|O_NONBLOCK) = 0 connect(7, {sa_family=AF_INET, sin_port=htons(2711), sin_addr=inet_addr("192.168.0.131")}, 16) = -1 EINPROGRESS (Operation now in progress) accept(4, 0xbf863918, [112]) = -1 EAGAIN (Resource temporarily unavailable) time(NULL) = 1182719853 poll([{fd=4, events=POLLIN}, {fd=7, events=POLLOUT, revents=POLLOUT}], 2, 1000) = 1 getsockopt(7, SOL_SOCKET, SO_ERROR, [0], [4]) = 0 getsockname(6, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("192.168.0.128")}, [16]) = 0 writev(7, [{"\1\1\0\1\0\10\0\0\0\1\0\0\0\0\0\0\1\4\0\1\0036\0\0\17\17"..., 854}, {"\1\5\0\1\0\0\0\0", 8}], 2) = 862 time(NULL) = 1182719853 poll([{fd=4, events=POLLIN}, {fd=7, events=POLLIN}], 2, 1000) = 0 time(NULL) = 1182719854 poll([{fd=4, events=POLLIN}, {fd=7, events=POLLIN, revents=POLLIN}], 2, 1000) = 1 ioctl(7, FIONREAD, [0]) = 0 time(NULL) = 1182719854 poll([{fd=4, events=POLLIN}, {fd=7, events=POLLIN, revents=POLLIN}], 2, 1000) = 1 ioctl(7, FIONREAD, [0]) = 0 time(NULL) = 1182719854 poll([{fd=4, events=POLLIN}, {fd=7, events=POLLIN, revents=POLLIN}], 2, 1000) = 1 ioctl(7, FIONREAD, [0]) = 0
Updated by jan over 17 years ago
- Status changed from New to Assigned
Can you please remove line 2443 from mod_fastcgi.c and try again ?
} else { if (errno == EAGAIN) return 0; <-- this one log_error_write(...)
Updated by admin over 17 years ago
I don't see ioctl returning EAGAIN in my strace. Why do you think that change will have an effect?
ioctl(7, FIONREAD, r0) = 0
BTW, I'm trying to reproduce the issue again now, but I didn't succeed yet. The strace indicates my app send something back (revents=POLLIN) but I can't remember doing any writes yet.
Updated by darix over 17 years ago
because line 2443 would never be reached if errno was EAGAIN above in this function.
that said ... it must be an old EAGAIN errno which shouldnt be handled here.
Updated by admin over 17 years ago
I've removed the line but the behaviour didn't change, it still goes in the loop.
Updated by admin over 17 years ago
Actually, it did the trick, now it says:
2007-07-02 14:58:08: (mod_fastcgi.c.2463) unexpected end-of-file (perhaps the fastcgi process died): pid: 0 socket: tcp:192.168.0.131:2711
2007-07-02 14:58:08: (mod_fastcgi.c.3257) response not received, request sent: 862 on socket: tcp:192.168.0.131:2711 for /xbt , closing connection
Updated by darix over 17 years ago
so it still loops (takes all the cpu?) but at least we reach the error message again?
Updated by admin over 17 years ago
No, it properly closes the fd now too, so it doesn't loop anymore.
Updated by admin over 17 years ago
I've removed the line but the behaviour didn't change, it still goes in the loop.
This was my fault, I used the new executable but the old modules (probably, I just pointed it to the old conf).
Updated by darix over 17 years ago
- Status changed from Assigned to Fixed
- Resolution set to fixed
fixed in 1879
Updated by admin over 17 years ago
Wouldn't it be much safer to store and use the function return value instead of the global errno, to prevent such bugs completely?
Updated by darix over 17 years ago
uhm. many system functions use errno. lighttpd's own code doesnt use errno internally iirc.
Updated by admin over 17 years ago
Ah, you're right: "On error, -1 is returned, and errno is set appropriately."
I assumed the error itself would be returned, but with just -1 you need to use errno.
Also available in: Atom