Project

General

Profile

Bug #1245

closed

Repeatable 100% CPU usage due to remote FastCGI app misbehaviour

Added by admin over 13 years ago. Updated over 13 years ago.

Status:
Fixed
Priority:
Normal
Category:
core
Target version:
ASK QUESTIONS IN Forums:

Description

Hi,

I'm writing a new FastCGI app (without using a FastCGI lib) and I got Lighttpd to eat all my CPU cycles.


accept(4, {sa_family=AF_INET, sin_port=htons(3052), sin_addr=inet_addr("192.168.0.131")}, [16]) = 6
brk(0x80d2000)                          = 0x80d2000
brk(0x80f3000)                          = 0x80f3000
fcntl64(6, F_SETFD, FD_CLOEXEC)         = 0
fcntl64(6, F_SETFL, O_RDWR|O_NONBLOCK)  = 0
ioctl(6, FIONREAD, [444])               = 0
read(6, "GET /xbt/ HTTP/1.1\r\nHost: 192.16"..., 447) = 444
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 7
fcntl64(7, F_SETFD, FD_CLOEXEC)         = 0
fcntl64(7, F_SETFL, O_RDWR|O_NONBLOCK)  = 0
connect(7, {sa_family=AF_INET, sin_port=htons(2711), sin_addr=inet_addr("192.168.0.131")}, 16) = -1 EINPROGRESS (Operation now in progress)
accept(4, 0xbf863918, [112])            = -1 EAGAIN (Resource temporarily unavailable)
time(NULL)                              = 1182719853
poll([{fd=4, events=POLLIN}, {fd=7, events=POLLOUT, revents=POLLOUT}], 2, 1000) = 1
getsockopt(7, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
getsockname(6, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("192.168.0.128")}, [16]) = 0
writev(7, [{"\1\1\0\1\0\10\0\0\0\1\0\0\0\0\0\0\1\4\0\1\0036\0\0\17\17"..., 854}, {"\1\5\0\1\0\0\0\0", 8}], 2) = 862
time(NULL)                              = 1182719853
poll([{fd=4, events=POLLIN}, {fd=7, events=POLLIN}], 2, 1000) = 0
time(NULL)                              = 1182719854
poll([{fd=4, events=POLLIN}, {fd=7, events=POLLIN, revents=POLLIN}], 2, 1000) = 1
ioctl(7, FIONREAD, [0])                 = 0
time(NULL)                              = 1182719854
poll([{fd=4, events=POLLIN}, {fd=7, events=POLLIN, revents=POLLIN}], 2, 1000) = 1
ioctl(7, FIONREAD, [0])                 = 0
time(NULL)                              = 1182719854
poll([{fd=4, events=POLLIN}, {fd=7, events=POLLIN, revents=POLLIN}], 2, 1000) = 1
ioctl(7, FIONREAD, [0])                 = 0
#1

Updated by jan over 13 years ago

  • Status changed from New to Assigned

Can you please remove line 2443 from mod_fastcgi.c and try again ?


    } else {
        if (errno == EAGAIN) return 0; <-- this one
        log_error_write(...)
#2

Updated by admin over 13 years ago

I don't see ioctl returning EAGAIN in my strace. Why do you think that change will have an effect?

ioctl(7, FIONREAD, r0) = 0

BTW, I'm trying to reproduce the issue again now, but I didn't succeed yet. The strace indicates my app send something back (revents=POLLIN) but I can't remember doing any writes yet.

#3

Updated by darix over 13 years ago

because line 2443 would never be reached if errno was EAGAIN above in this function.
that said ... it must be an old EAGAIN errno which shouldnt be handled here.

#4

Updated by admin over 13 years ago

I've removed the line but the behaviour didn't change, it still goes in the loop.

#5

Updated by admin over 13 years ago

Actually, it did the trick, now it says:
2007-07-02 14:58:08: (mod_fastcgi.c.2463) unexpected end-of-file (perhaps the fastcgi process died): pid: 0 socket: tcp:192.168.0.131:2711
2007-07-02 14:58:08: (mod_fastcgi.c.3257) response not received, request sent: 862 on socket: tcp:192.168.0.131:2711 for /xbt , closing connection

#6

Updated by darix over 13 years ago

so it still loops (takes all the cpu?) but at least we reach the error message again?

#7

Updated by admin over 13 years ago

No, it properly closes the fd now too, so it doesn't loop anymore.

#8

Updated by admin over 13 years ago

I've removed the line but the behaviour didn't change, it still goes in the loop.

This was my fault, I used the new executable but the old modules (probably, I just pointed it to the old conf).

#9

Updated by darix over 13 years ago

  • Status changed from Assigned to Fixed
  • Resolution set to fixed

fixed in 1879

#10

Updated by admin over 13 years ago

Wouldn't it be much safer to store and use the function return value instead of the global errno, to prevent such bugs completely?

#11

Updated by darix over 13 years ago

uhm. many system functions use errno. lighttpd's own code doesnt use errno internally iirc.

#12

Updated by admin over 13 years ago

Ah, you're right: "On error, -1 is returned, and errno is set appropriately."
I assumed the error itself would be returned, but with just -1 you need to use errno.

Also available in: Atom