Bug #2086

mod_status has too many 'close' states

Added by akrus about 5 years ago. Updated about 5 years ago.

Status:FixedStart date:2009-10-19
Priority:NormalDue date:
Assignee:-% Done:

100%

Category:mod_status
Target version:1.4.24
Missing in 1.5.x:No

Description

Hello!

I've updated to 1.4.24 RC yesterday (from the stbuehler repository for debian) and the problem appeared in mod_status: http://bit.ly/UOIj5
As you can see there're many connection in 'close' state. I'm not sure it affects something else, but looks really strange :)


Related issues

Related to Bug #657: lighty vs. apt-get - problem with pipelining Fixed

Associated revisions

Revision 2669
Added by stbuehler about 5 years ago

Fix linger close timeout handling, drop timeout to 5 seconds (fixes #2086)

History

#1 Updated by stbuehler about 5 years ago

Hm, looks like the timeout didn't work; they should be closed after 30 seconds.

#2 Updated by akrus about 5 years ago

According to netstat there're many 'FIN_WAIT2's.

#3 Updated by stbuehler about 5 years ago

Which fd-event handler are you using?

And it would be nice if you could have a look at strace, and perhaps paste a little bit from it - it would be interesting to see if the FIN_WAIT2 fds gets signaled "active" (it should signal EOF as POLL_IN or POLL_HUP).

#4 Updated by akrus about 5 years ago

I think fd-event handler is default one (I didn't specify anything in config). According to strace it's epoll.

Could you give some more instructions on strace? strace -p <pid> returns many strings, but I have no idea what to look for, the is no 'EOF' :)

#5 Updated by akrus about 5 years ago

Maybe:

epoll_ctl(6, EPOLL_CTL_MOD, 153, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=153, u64=153}}) = 0                                                         
epoll_ctl(6, EPOLL_CTL_MOD, 153, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=153, u64=153}}) = 0                                                         
epoll_ctl(6, EPOLL_CTL_MOD, 153, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=153, u64=153}}) = 0                                                         
epoll_ctl(6, EPOLL_CTL_MOD, 153, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=153, u64=153}}) = 0                                                         
epoll_ctl(6, EPOLL_CTL_MOD, 153, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=153, u64=153}}) = 0                                                         
epoll_ctl(6, EPOLL_CTL_MOD, 153, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=153, u64=153}}) = 0

#6 Updated by stbuehler about 5 years ago

You could search for fds matching FIN_WAIT2 connections with lsof; this one looks like an active connection to me (it asks for EPOLLOUT, so it wants to write something). In "close" state it should ask for EPOLLIN|EPOLLERR|EPOLLHUP. EPOLL_CTL_MOD means it is setting that flags, epoll_wait() calls are probably more interesting.

#7 Updated by akrus about 5 years ago

epoll_ctl(6, EPOLL_CTL_MOD, 161, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=161, u64=161}}) = 0                                                          
epoll_wait(6, {{EPOLLIN, {u32=161, u64=161}}}, 1025, 1000) = 1              

epoll_ctl(6, EPOLL_CTL_MOD, 153, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=153, u64=153}}) = 0                                                          
epoll_ctl(6, EPOLL_CTL_MOD, 153, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=153, u64=153}}) = 0                                                          
epoll_wait(6, {{EPOLLOUT, {u32=135, u64=135}}}, 1025, 1000) = 1                                      

Something like this~

lsof and lsof -i (with | grep FIN_WAIT2) returns nothing...

#8 Updated by stbuehler about 5 years ago

Ok, i think i can reproduce it. Just make a request with a keep-alive client (and enable keep-alive on the server), and wait for the timeout:

13:06:23.327767 accept(3, {sa_family=AF_INET, sin_port=htons(55370), sin_addr=inet_addr("127.0.0.1")}, [8135381057554874384]) = 6                                                                                                                                             
13:06:23.327994 brk(0x179c000)          = 0x179c000                                                                                                                                                                                                                           
13:06:23.328300 brk(0x17bd000)          = 0x17bd000                                                                                                                                                                                                                           
13:06:23.328594 fcntl(6, F_SETFD, FD_CLOEXEC) = 0                                                                                                                                                                                                                             
13:06:23.328642 fcntl(6, F_SETFL, O_RDWR|O_NONBLOCK) = 0                                                                                                                                                                                                                      
13:06:23.328694 ioctl(6, FIONREAD, [429]) = 0                                                                                                                                                                                                                                 
13:06:23.328754 read(6, "GET /test.txt HTTP/1.1\r\nHost: 127.0.0.1:8081\r\nConnection: Keep-Alive\r\n"..., 447) = 429                                                
13:06:23.328975 stat(".../test.txt", {st_mode=S_IFREG|0644, st_size=328, ...}) = 0                                                                                                                                                        
13:06:23.329074 open(".../test.txt", O_RDONLY) = 7                                                                                                                                                                                        
13:06:23.329137 close(7)                = 0
13:06:23.329287 setsockopt(6, SOL_TCP, TCP_CORK, [1], 4) = 0
13:06:23.329348 writev(6, [{"HTTP/1.1 200 OK\r\nContent-Type: text/plain\r\nAccept-Ranges: bytes\r\nETag: \"2301341072\"\r\nLast-Modified: Tue, 05 Feb 2008 22:45:31 GMT\r\nContent-Length: 328\r\nDate: Mon, 19 Oct 2009 11:06:22 GMT\r\nServer: lighttpd/1.4.24\r\n\r\n"..., 216}], 1) = 216
13:06:23.329448 open(".../test.txt", O_RDONLY) = 7
13:06:23.329511 fcntl(7, F_SETFD, FD_CLOEXEC) = 0
13:06:23.329559 sendfile(6, 7, [0], 328) = 328
13:06:23.329614 close(7)                = 0
13:06:23.329661 setsockopt(6, SOL_TCP, TCP_CORK, [0], 4) = 0
13:06:23.329743 ioctl(6, FIONREAD, [0]) = 0
13:06:23.329802 read(6, 0x17bacd0, 4159) = -1 EAGAIN (Resource temporarily unavailable)
13:06:23.329882 epoll_ctl(5, EPOLL_CTL_ADD, 6, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=6, u64=6}}) = 0
13:06:23.329941 accept(3, 0x7ffffa63a800, [8135381057554874480]) = -1 EAGAIN (Resource temporarily unavailable)
13:06:23.330009 epoll_ctl(5, EPOLL_CTL_MOD, 6, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=6, u64=6}}) = 0
13:06:23.330071 epoll_wait(5, {}, 1025, 1000) = 0
13:06:24.330132 epoll_wait(5, {}, 1025, 1000) = 0
13:06:25.330134 epoll_wait(5, {}, 1025, 1000) = 0
13:06:26.330097 epoll_wait(5, {}, 1025, 1000) = 0
13:06:27.330094 epoll_wait(5, {}, 1025, 1000) = 0
13:06:28.330171 shutdown(6, 1 /* send */) = 0
13:06:28.330286 ioctl(6, FIONREAD, [0]) = 0
13:06:28.330357 epoll_ctl(5, EPOLL_CTL_MOD, 6, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=6, u64=6}}) = 0
13:06:28.330423 epoll_wait(5, {}, 1025, 1000) = 0
[nothing happens now...]
13:07:13.329978 epoll_wait(5, {}, 1025, 1000) = 0
13:07:14.330096 epoll_wait(5, ^C <unfinished ...>

(This is with a keepalive-idle of 5 seconds)
You can see that lighty closes the "output" of the socket after the keep-alive timeout (shutdown), but it never sees an EOF from the client (that would be read(...) = 0); and it looks like epoll_wait doesn't detect EOF as input activity.

#9 Updated by stbuehler about 5 years ago

Just a small detail: FIN_WAIT2 means the client didn't close its end of the connection, so lighttpd waits for a FIN (a read() would return EAGAIN). That is no reason not to close it after the timeout (i will fix that), but it indicates that many clients don't handle this in a clean way. I will therefor drop the HTTP_LINGER_TIMEOUT to 5 seconds (was 30).

#10 Updated by stbuehler about 5 years ago

  • Status changed from New to Fixed
  • % Done changed from 0 to 100

Applied in changeset r2669.

Also available in: Atom