Bug #1245: Repeatable 100% CPU usage due to remote FastCGI app misbehaviour - Lighttpd - lighty labs

Actions

Copy link

Bug #1245

closed

Repeatable 100% CPU usage due to remote FastCGI app misbehaviour

Added by admin almost 18 years ago. Updated almost 18 years ago.

Status:

Fixed

Priority:

Normal

Category:

core

Target version:

1.4.16

ASK QUESTIONS IN Forums:

Description

Hi,

I'm writing a new FastCGI app (without using a FastCGI lib) and I got Lighttpd to eat all my CPU cycles.


accept(4, {sa_family=AF_INET, sin_port=htons(3052), sin_addr=inet_addr("192.168.0.131")}, [16]) = 6
brk(0x80d2000)                          = 0x80d2000
brk(0x80f3000)                          = 0x80f3000
fcntl64(6, F_SETFD, FD_CLOEXEC)         = 0
fcntl64(6, F_SETFL, O_RDWR|O_NONBLOCK)  = 0
ioctl(6, FIONREAD, [444])               = 0
read(6, "GET /xbt/ HTTP/1.1\r\nHost: 192.16"..., 447) = 444
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 7
fcntl64(7, F_SETFD, FD_CLOEXEC)         = 0
fcntl64(7, F_SETFL, O_RDWR|O_NONBLOCK)  = 0
connect(7, {sa_family=AF_INET, sin_port=htons(2711), sin_addr=inet_addr("192.168.0.131")}, 16) = -1 EINPROGRESS (Operation now in progress)
accept(4, 0xbf863918, [112])            = -1 EAGAIN (Resource temporarily unavailable)
time(NULL)                              = 1182719853
poll([{fd=4, events=POLLIN}, {fd=7, events=POLLOUT, revents=POLLOUT}], 2, 1000) = 1
getsockopt(7, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
getsockname(6, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("192.168.0.128")}, [16]) = 0
writev(7, [{"\1\1\0\1\0\10\0\0\0\1\0\0\0\0\0\0\1\4\0\1\0036\0\0\17\17"..., 854}, {"\1\5\0\1\0\0\0\0", 8}], 2) = 862
time(NULL)                              = 1182719853
poll([{fd=4, events=POLLIN}, {fd=7, events=POLLIN}], 2, 1000) = 0
time(NULL)                              = 1182719854
poll([{fd=4, events=POLLIN}, {fd=7, events=POLLIN, revents=POLLIN}], 2, 1000) = 1
ioctl(7, FIONREAD, [0])                 = 0
time(NULL)                              = 1182719854
poll([{fd=4, events=POLLIN}, {fd=7, events=POLLIN, revents=POLLIN}], 2, 1000) = 1
ioctl(7, FIONREAD, [0])                 = 0
time(NULL)                              = 1182719854
poll([{fd=4, events=POLLIN}, {fd=7, events=POLLIN, revents=POLLIN}], 2, 1000) = 1
ioctl(7, FIONREAD, [0])                 = 0

Actions

Copy link

Updated by jan almost 18 years ago

Status changed from New to Assigned

Can you please remove line 2443 from mod_fastcgi.c and try again ?


    } else {
        if (errno == EAGAIN) return 0; <-- this one
        log_error_write(...)

Actions

Copy link

Updated by admin almost 18 years ago

I don't see ioctl returning EAGAIN in my strace. Why do you think that change will have an effect?

ioctl(7, FIONREAD, r0) = 0

BTW, I'm trying to reproduce the issue again now, but I didn't succeed yet. The strace indicates my app send something back (revents=POLLIN) but I can't remember doing any writes yet.

Actions

Copy link

Updated by darix almost 18 years ago

because line 2443 would never be reached if errno was EAGAIN above in this function.
that said ... it must be an old EAGAIN errno which shouldnt be handled here.

Actions

Copy link

Updated by admin almost 18 years ago

I've removed the line but the behaviour didn't change, it still goes in the loop.

Actions

Copy link

Updated by admin almost 18 years ago

Actually, it did the trick, now it says:
2007-07-02 14:58:08: (mod_fastcgi.c.2463) unexpected end-of-file (perhaps the fastcgi process died): pid: 0 socket: tcp:192.168.0.131:2711
2007-07-02 14:58:08: (mod_fastcgi.c.3257) response not received, request sent: 862 on socket: tcp:192.168.0.131:2711 for /xbt , closing connection

Actions

Copy link

Updated by darix almost 18 years ago

so it still loops (takes all the cpu?) but at least we reach the error message again?

Actions

Copy link

Updated by admin almost 18 years ago

No, it properly closes the fd now too, so it doesn't loop anymore.

Actions

Copy link

Updated by admin almost 18 years ago

I've removed the line but the behaviour didn't change, it still goes in the loop.

This was my fault, I used the new executable but the old modules (probably, I just pointed it to the old conf).

Actions

Copy link

Updated by darix almost 18 years ago

Status changed from Assigned to Fixed
Resolution set to fixed

fixed in 1879

Actions

Copy link

#10

Updated by admin almost 18 years ago

Wouldn't it be much safer to store and use the function return value instead of the global errno, to prevent such bugs completely?

Actions

Copy link

#11

Updated by darix almost 18 years ago

uhm. many system functions use errno. lighttpd's own code doesnt use errno internally iirc.

Actions

Copy link

#12

Updated by admin almost 18 years ago

Ah, you're right: "On error, -1 is returned, and errno is set appropriately."
I assumed the error itself would be returned, but with just -1 you need to use errno.

Actions

Copy link

Also available in: Atom

Project

General

Profile

Lighttpd

Custom queries

Bug #1245

Repeatable 100% CPU usage due to remote FastCGI app misbehaviour

Updated by jan almost 18 years ago

Updated by admin almost 18 years ago

Updated by darix almost 18 years ago

Updated by admin almost 18 years ago

Updated by admin almost 18 years ago

Updated by darix almost 18 years ago

Updated by admin almost 18 years ago

Updated by admin almost 18 years ago

Updated by darix almost 18 years ago

Updated by admin almost 18 years ago

Updated by darix almost 18 years ago

Updated by admin almost 18 years ago