Project

General

Profile

Bug #2743

1.4.40/41 mod_proxy, mod_scgi may trigger POLLHUP on *BSD,Darwin

Added by flynn 8 months ago. Updated 6 months ago.

Status:
Fixed
Priority:
Normal
Assignee:
-
Category:
mod_proxy
Target version:
Start date:
2016-08-02
Due date:
% Done:

100%

Missing in 1.5.x:

Description

After updating to version 1.4.40 some (not all) proxied requests fail with a return value of 500.
This happens e.g. proxying a buildbot instance, but only to static content like a css-file.
Dynamic content like the current state in a json file still works.

Looking in tcpdump trace shows, that lighttpd closes the connection with a shutdown directly after
the sending he request:

epoll_ctl(28, EPOLL_CTL_MOD, 62, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=62, u64=62}}) = 0
writev(62, [{"GET /default.css HTTP/1.0\r\nUser-"..., 185}], 1) = 185
epoll_ctl(28, EPOLL_CTL_MOD, 62, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=62, u64=62}}) = 0
shutdown(62, SHUT_WR) = 0

The difference between static and dynamic requests is maybe, that buildbot uses setsockopt with SO_KEEPALIVE
to indicate, that it ma take longer. Requests to static resources, do not have this setsockopt and the shuwdown
happens immediately.

So I searched for a new introduced call to shutdown in mod_proxy.c and found it:

https://github.com/lighttpd/lighttpd1.4/commit/923688d2da036f3cefc4fb494dcd770acaab1691

If I comment out/delete the line 857 in src/mod_proxy.c:

shutdown(hctx->fd, SHUT_WR);/* future: remove if HTTP/1.1 request */

everything works for me again without any error as before.

mod_proxy-no-shutdown-1.4.41.patch View - revert shutdown call in mod_proxy.c (532 Bytes) flynn, 2016-08-02 08:49

mod_proxy-problem.pcap - tcpdump of proxied call closed to early (819 Bytes) flynn, 2016-08-02 09:04


Related issues

Related to Bug #2688: [PATCH] LightyTest.pm: do not shutdown on darwin Fixed 2015-12-03
Duplicated by Bug #2751: mod_proxy closes connection Duplicate 2016-09-02

Associated revisions

Revision 1de652f4 (diff)
Added by gstrauss 8 months ago

[mod_proxy,mod_scgi] shutdown remote only if local (#2743)

shutdown(fd, SHUT_WR) after sending request to proxy or SCGI
only if remote is local and platform is not *BSD or Darwin.

The reason this fix is special-casing *BSD and Darwin is that the Single
Unix Specification and POSIX.1-2013 clearly specify that POLLHUP event
should be returned by poll only when the stream is no longer writable.
A half-closed socket that is still writable clearly does not match that
condition, yet that is what I am seeing on Darwin (El Capitan), and
presumably what others are seeing on *BSD, from which Apple originally
inherited the Darwin TCP stack.

Single Unix Specification (SUSv2) from 1997
(yes, that is nearly 20 years ago):
http://pubs.opengroup.org/onlinepubs/007908799/xsh/poll.html

POLLHUP
The device has been disconnected. This event and POLLOUT are
mutually exclusive; a stream can never be writable if a hangup has
occurred. However, this event and POLLIN, POLLRDNORM, POLLRDBAND or
POLLPRI are not mutually exclusive. This flag is only valid in the
revents bitmask; it is ignored in the events member.

Updated version of The Open Group Base Specifications Issue 7
(published in 2013):
http://pubs.opengroup.org/onlinepubs/9699919799/

POLLHUP
A device has been disconnected, or a pipe or FIFO has been closed
by the last process that had it open for writing. Once set, the
hangup state of a FIFO shall persist until some process opens the
FIFO for writing or until all read-only file descriptors for the
FIFO are closed. This event and POLLOUT are mutually-exclusive;
a stream can never be writable if a hangup has occurred. However,
this event and POLLIN, POLLRDNORM, POLLRDBAND, or POLLPRI are not
mutually-exclusive. This flag is only valid in the revents bitmask;
it shall be ignored in the events member.

x-ref:
"1.4.40/41 mod_proxy, mod_scgi may trigger POLLHUP on *BSD,Darwin"
https://redmine.lighttpd.net/issues/2743

Revision 4bc06bfc (diff)
Added by gstrauss 8 months ago

[core] check if client half-closed TCP if POLLHUP (#2743)

Check if client half-closed TCP connection if POLLHUP is received.
This more robustly handles if client called shutdown(fd, SHUT_WR).

This patch reverts ab05eb7c which should now be handled properly.
(Time will tell.)

x-ref:
"1.4.40/41 mod_proxy, mod_scgi may trigger POLLHUP on *BSD,Darwin"
https://redmine.lighttpd.net/issues/2743

Revision 27f85dbd (diff)
Added by gstrauss 8 months ago

[core] proxy,scgi omit shutdown() to backend (fixes #2743)

Due to the POLLHUP behavior triggered on *BSD/Darwin, the shutdown()
had previously been limited to local connections. If interested in
squeezing the last bits of performance out of a machine, an admin
should configure local connections to be AF_UNIX instead of AF_INET
or AF_INET6 to localhost. The reason the shutdown() was originally
added in mod_proxy and mod_scgi was to aggressively reduce the number
of potential sockets in TIME_WAIT held by lighttpd.
(See 923688d2 "drain backend socket/pipe bufs upon FDEVENT_HUP",
done for reliability given the aforementioned *BSD/Darwin behavior.)
When using AF_UNIX, the TIME_WAIT issue does not exist, ergo, the
recommendation is to use AF_UNIX for local sockets, when available.
Using AF_UNIX sockets is a better solution to eliminate TIME_WAIT
than is TCP shutdown() half-close which, as we have seen, might not
be handled well by frameworks which are more complex than basic read
request, send response, and close.

x-ref:
"1.4.40/41 mod_proxy, mod_scgi may trigger POLLHUP on *BSD,Darwin"
https://redmine.lighttpd.net/issues/2743

History

#1 Updated by gstrauss 8 months ago

  • Status changed from New to Need Feedback

Who is sending the 500, the buildbot or lighttpd?

lighttpd mod_proxy currently sends HTTP/1.0 request to backend, and an explicit 'Connection: close' is included. lighttpd calling shutdown(fd, SHUT_WR) to the backend is an additional indication to the backend that there are no further requests to be sent, and that TCP socket shutdown can proceed as soon as the backend sends the response.

#2 Updated by flynn 8 months ago

The lighttpd sends the 500.

The connection is closed just after sending the request, lighttpd does not wait for the response.

In the tcpdump trace I see no response packets from the buildbot, only the request isself and the close of the connection.

In the attached example the port 8010 ist the remote port of the buildbot instance.

#3 Updated by gstrauss 8 months ago

Is lighttpd not waiting for a response or does the buildbot close the connection to lighttpd without sending a response?

What is the buildbot? (is it open source?)

#4 Updated by gstrauss 8 months ago

See 923688d2 in lighttpd which adds more robust POLLHUP handling to lighttpd, and also adds the call to shutdown(). My hunch is that the buildbot you are using gets a POLLHUP and fails to read remaining data from the socket buffers.

#5 Updated by flynn 8 months ago

Buildbot is open source: http://buildbot.net/
It is a continous integration application based on the python twisted framework.

#6 Updated by gstrauss 8 months ago

What operating system are you on? From a quick glance, it would appear that Twisted properly handles POLLHUP when it also receives POLLIN. However, many OSes are inconsistent with one another about how BSD-style socket events are reported. Linux introduced POLLRDHUP (which must be requested) to signify that there is no more data to be read. Many applications consider POLLHUP to be an error indicating client disconnect, when it might not be an error if the client did a half-close on the socket with shutdown(fd, SHUT_WR), and the OS returned POLLHUP as an event on the polled fd, even if the connection is still writable.

Would you please confirm whether or not lighttpd is seeing that the backend (buildbot) closed the connection before lighttpd returns 500 because the backend did not send a response? In an strace, you should see that lighttpd reads (read() or recv()) 0 bytes from the backend, indicating end-of-stream.

#7 Updated by flynn 8 months ago

OS is Linux. A longer strace:

writev(63, [{"GET /default.css HTTP/1.0\r\nUser-"..., 185}], 1) = 185
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2945, ...}) = 0
write(27, "2016-08-02 12:38:55: (mod_proxy."..., 71) = 71
epoll_ctl(28, EPOLL_CTL_MOD, 63, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=63, u64=63}}) = 0
shutdown(63, SHUT_WR) = 0
epoll_wait(28, {{EPOLLIN|EPOLLHUP, {u32=63, u64=63}}}, 1025, 1000) = 1
ioctl(63, FIONREAD, [0]) = 0
epoll_ctl(28, EPOLL_CTL_DEL, 63, {0, {u32=0, u64=0}}) = 0
close(63) = 0

There is no read/recv from remote site.
epoll reports an IN event without any data available or error??

The problem is, that ioctl FIONREAD reports 0 bytes to read,
this leads lighttpd to close the connection.

In the tcpdump you can see:
- the packet with payload containing the http request
- an ACK packet from remote
- lighttpd begins closing the connection (FIN, ACK)

Without the call to shutdown, ioctl FIONREAD reports the correct size and everything is fine.

The call to the shutdown function modifies the result ioctl FIONREAD ...

But why?

#8 Updated by gstrauss 8 months ago

Would you strace the buildbot side of the connection?
(lighttpd does not need to read() or recv() 0 bytes if ioctl() tells lighttpd that it will read 0 bytes)

Let me be clear so that I don't keep asking the same question:
please demonstrate that buildbot has sent a response before buildbot closes its side of the connection.

#9 Updated by flynn 8 months ago

Buildbot does not send any response, because itself receives zero bytes and closes the connection.

strace Buildbot with timestamps:

13:41:32.816716 accept(17, {sa_family=AF_INET, sin_port=htons(40978), sin_addr=inet_addr("127.0.0.1")}, [16]) = 12
13:41:32.816852 fcntl(12, F_GETFD)      = 0
13:41:32.816906 fcntl(12, F_SETFD, FD_CLOEXEC) = 0
13:41:32.817038 fcntl(12, F_GETFL)      = 0x2 (flags O_RDWR)
13:41:32.817083 fcntl(12, F_SETFL, O_RDWR|O_NONBLOCK) = 0
13:41:32.817191 epoll_ctl(6, EPOLL_CTL_ADD, 12, {EPOLLIN, {u32=12, u64=40046962262671372}}) = 0
13:41:32.817287 accept(17, 0x7ffc0da91700, [16]) = -1 EAGAIN (Resource temporarily unavailable)
13:41:32.817411 epoll_wait(6, {{EPOLLIN, {u32=12, u64=40046962262671372}}}, 6, 660) = 1
13:41:32.817484 recvfrom(12, "GET /default.css HTTP/1.0\r\nUser-"..., 65536, 0, NULL, NULL) = 185
13:41:32.817731 getsockname(12, {sa_family=AF_INET, sin_port=htons(8010), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
13:41:32.817876 stat("/var/lib/buildbot/masters/MapOneLibs/public_html", {st_mode=S_IFDIR|0750, st_size=4096, ...}) = 0
13:41:32.817998 stat("/var/lib/buildbot/masters/MapOneLibs/public_html/default.css", {st_mode=S_IFREG|0640, st_size=10137, ...}) = 0
13:41:32.818126 stat("/var/lib/buildbot/masters/MapOneLibs/public_html/default.css", {st_mode=S_IFREG|0640, st_size=10137, ...}) = 0
13:41:32.818211 open("/var/lib/buildbot/masters/MapOneLibs/public_html/default.css", O_RDONLY) = 13
13:41:32.818256 fstat(13, {st_mode=S_IFREG|0640, st_size=10137, ...}) = 0
13:41:32.818361 fstat(13, {st_mode=S_IFREG|0640, st_size=10137, ...}) = 0
13:41:32.818398 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0478506000
13:41:32.818443 read(13, "body.interface {\n\tmargin-left: 3"..., 65536) = 10137
13:41:32.818490 read(13, "", 53248)     = 0
13:41:32.818655 epoll_ctl(6, EPOLL_CTL_MOD, 12, {EPOLLIN|EPOLLOUT, {u32=12, u64=40046962262671372}}) = 0
13:41:32.818774 epoll_wait(6, {{EPOLLIN|EPOLLOUT, {u32=12, u64=40046962262671372}}}, 6, 659) = 1
13:41:32.818843 recvfrom(12, "", 65536, 0, NULL, NULL) = 0
13:41:32.818895 epoll_ctl(6, EPOLL_CTL_MOD, 12, {EPOLLOUT, {u32=12, u64=40046962262671372}}) = 0
13:41:32.818989 close(13)               = 0
13:41:32.819025 munmap(0x7f0478506000, 4096) = 0
13:41:32.819084 epoll_ctl(6, EPOLL_CTL_DEL, 12, {EPOLLWRNORM|EPOLLWRBAND|EPOLLRDHUP|EPOLLONESHOT|0x3540c800, {u32=32516, u64=24336577484324612}}) = 0
13:41:32.819160 shutdown(12, SHUT_RDWR) = 0
13:41:32.819246 close(12)               = 0

Comparing with a corresponding tcpdump shows that

13:41:32.818843 recvfrom(12, "", 65536, 0, NULL, NULL) = 0

is the reaction to <FIN,ACK> TCP packet from the lighttpd port.

strace lighttpd:

13:41:32.816353 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 63
13:41:32.816398 fcntl(63, F_SETFD, FD_CLOEXEC) = 0
13:41:32.816433 fcntl(63, F_SETFL, O_RDWR|O_NONBLOCK) = 0
13:41:32.816470 connect(63, {sa_family=AF_INET, sin_port=htons(8010), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
13:41:32.816603 epoll_ctl(28, EPOLL_CTL_ADD, 63, {EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=63, u64=63}}) = 0
13:41:32.816668 accept(10, 0x7ffc69d03020, [112]) = -1 EAGAIN (Resource temporarily unavailable)
13:41:32.816737 epoll_wait(28, {{EPOLLOUT, {u32=63, u64=63}}}, 1025, 1000) = 1
13:41:32.816802 getsockopt(63, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
13:41:32.816868 epoll_ctl(28, EPOLL_CTL_MOD, 63, {EPOLLIN|EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=63, u64=63}}) = 0
13:41:32.816924 writev(63, [{"GET /default.css HTTP/1.0\r\nUser-"..., 185}], 1) = 185
13:41:32.817035 epoll_ctl(28, EPOLL_CTL_MOD, 63, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=63, u64=63}}) = 0
13:41:32.817092 shutdown(63, SHUT_WR)   = 0
13:41:32.817148 epoll_wait(28, {{EPOLLIN|EPOLLHUP, {u32=63, u64=63}}}, 1025, 1000) = 1
13:41:32.819253 ioctl(63, FIONREAD, [0]) = 0
13:41:32.819319 epoll_ctl(28, EPOLL_CTL_DEL, 63, {0, {u32=0, u64=0}}) = 0
13:41:32.819374 close(63)               = 0

I hope this is, what you requested.

I forgot to mention, that I have this "double" proxy configuration, that is needed to rewrite a request, that is proxied,
see the last answer to https://redmine.lighttpd.net/issues/164 (This is maybe another issue to fix better in lighttpd).

#10 Updated by gstrauss 8 months ago

13:41:32.817484 recvfrom(12, "GET /default.css HTTP/1.0\r\nUser-"..., 65536, 0, NULL, NULL) = 185
13:41:32.818843 recvfrom(12, "", 65536, 0, NULL, NULL) = 0

buildbot receives the request, but is not sending a response. Please file an issue with buildbot/twisted and post the link here. I'd like to follow it, but might not have time myself to dig into those many thousands of lines of python.

.

Separately, regarding #164, if you'd like to submit a patch, I'll review it. However, this is lower priority than tickets such as TLS and mod_auth. Also, I can predict that there will be a follow-up request by someone who wants to parse the HTML or XML or ??? and rewrite all the paths in the response from the backend.

Is buildbot really written in such a way as to require the root of the website? It doesn't have a /$prefix/ on the URL? Do all the links generated by buildbot work, or are they missing the "/$prefix/"?

#11 Updated by flynn 8 months ago

Buildbot really does not support a prefix option or feature, but I posted an alternative approach rewriting proxy requests with lua/mod_magnet in ticket #164.

Back to the problem with shutdown(..., SHUT_WR): yesterday I have had the same problem with self written scgi-backends and on this backends I could debug and understand the problem.

shutdown(..., SHUT_WR) needs special handling on the remote side: it is reported on the remote side with a read of 0 bytes.
Many applications treat a read of zero bytes as socket close and start to close their socket. This is what happens here.

This is wrong, as long as no error is reported.
The application must distinguish between read 0 bytes with and without reported error.
With this I could fix my scgi-backends, they are running now with an unmodified mod_scgi.
I'll take a look in twisted, whether I can suggest a fix there.
The perl/moose framework may have the same problem ...

#12 Updated by stbuehler 8 months ago

I'm pretty sure HTTP defines a "client side close" as connection abort (whether explicitly in the RFC, or implicitly because everyone did it), as the client has no other way to indicate it isn't interested in the response anymore; if the client closes the connection completely, the server would get a TCP RST, but only after it tries to send additional response data, not while simply polling for some events for example.

In other words, an HTTP client (including mod_proxy) must not call shutdown(SHUT_WR) before it received the response, as long as it is interested in getting the response.

#13 Updated by gstrauss 8 months ago

@flynn: thanks for your continued troubleshooting. It remains to be seen how many others run into this, and I recognize that there is a potential for others to trip over this. As you pointed out above, your code (and maybe buildbot/twisted) made the assumption that reading end-of-stream meant that the socket was closed, and this was a mistaken assumption since it did not differentiate between a half-close of a socket and an error on the socket. Handling TCP sockets has numerous edge cases, and there are many frameworks that attempt to hide this complexity from application-level programmers. If you're writing an asynchronous program, then there is an additional burden of understanding some of these nuances.

On the other hand, I posit that most (not all) code will have been coded simply. Most code will simply read a request in a blocking fashion and will process the request once the request is complete. Such simple code which handles a single request (and not HTTP/1.1 keep-alive) may not ever read end-of-stream since the code will read a single request, send a single response, and close the socket connection.

.

@stbuehler: can you point me to any references to back up the claims you make in your post above?

Some excerpts from the following appear to indicate the opposite of what you are saying
https://www.safaribooksonline.com/library/view/http-the-definitive/1565925092/ch04s07.html

TCP close and reset errors

Simple HTTP applications can use only full closes. But when applications start talking to many other types of HTTP clients, servers, and proxies, and when they start using pipelined persistent connections, it becomes important for them to use half closes to prevent peers from getting unexpected write errors.

In general, closing the output channel of your connection is always safe. The peer on the other side of the connection will be notified that you closed the connection by getting an end-of-stream notification once all the data has been read from its buffer.

[...]

Graceful close

The HTTP specification counsels that when clients or servers want to close a connection unexpectedly, they should "issue a graceful close on the transport connection," but it doesn't describe how to do that.

In general, applications implementing graceful closes will first close their output channels and then wait for the peer on the other side of the connection to close its output channels. When both sides are done telling each other they won't be sending any more data (i.e., closing output channels), the connection can be closed fully, with no risk of reset.

Unfortunately, there is no guarantee that the peer implements or checks for half closes. For this reason, applications wanting to close gracefully should half close their output channels and periodically check the status of their input channels (looking for data or for the end of the stream). If the input channel isn't closed by the peer within some timeout period, the application may force connection close to save resources.

All that said, here's a link to a discussion from 2001, which is much older than the book above:
https://lists.w3.org/Archives/Public/ietf-http-wg-old/2001JanApr/0037.html
https://lists.w3.org/Archives/Public/ietf-http-wg-old/2001JanApr/0039.html

That last link above is from one of the authors of the HTTP spec, Roy Fielding.

https://tools.ietf.org/html/rfc7230#section-6.6 Tear-down
recommends that the server tear down connections in multiple steps, starting with a half-close, but the spec does not say anything about whether the client can send half-close. See Roy Fielding's note in the ietf-http-wg message above for his suggestion why the spec should not specify such details, since HTTP protocol can be sent over many different transports, of which TCP is one possible transport.

#14 Updated by gstrauss 8 months ago

  • Subject changed from mod_proxy fails on some requests with 1.4.40/41 to 1.4.40/41 mod_proxy, mod_scgi may trigger POLLHUP on *BSD,Darwin
  • Status changed from Need Feedback to Patch Pending

It appears that a similar situation was reported in the past: #2688 "[PATCH] LightyTest.pm: do not shutdown on darwin"

I tested on Darwin and was able to reproduce what @flynn is seeing.

Following this post will be two commits. The first restricts the shutdown() calls in mod_proxy and mod_scgi to non-*BSD and non-Darwin platforms, and then only when the "remote" is localhost, since a remote machine could be one of the affected OSes even if the machine running lighttpd is not. The second commit will update lighttpd to better handle POLLHUP received when the event is due to a half-close (shutdown(fd, SHUT_WR) by the client) as opposed to a full-close or TCP RST, as lighttpd is also treating POLLHUP on the client socket as an unconditional error. The fix applies to most *BSD/Darwin platforms, except on OpenBSD, which does not support getsockopt() TCP_INFO.

The reason this fix is special-casing *BSD and Darwin is that the Single Unix Specification and POSIX.1-2013 clearly specify that POLLHUP event should be returned by poll only when the stream is no longer writable. A half-closed socket that is still writable clearly does not match that condition, yet that is what I am seeing on Darwin (El Capitan), and presumably what others are seeing on *BSD, from which Apple originally inherited the Darwin TCP stack.

Single Unix Specification (SUSv2) from 1997 (yes, that is nearly 20 years ago):
http://pubs.opengroup.org/onlinepubs/007908799/xsh/poll.html

POLLHUP
The device has been disconnected. This event and POLLOUT are mutually exclusive; a stream can never be writable if a hangup has occurred. However, this event and POLLIN, POLLRDNORM, POLLRDBAND or POLLPRI are not mutually exclusive. This flag is only valid in the revents bitmask; it is ignored in the events member.

Updated version of The Open Group Base Specifications Issue 7, published in 2013:
http://pubs.opengroup.org/onlinepubs/9699919799/

POLLHUP
A device has been disconnected, or a pipe or FIFO has been closed by the last process that had it open for writing. Once set, the hangup state of a FIFO shall persist until some process opens the FIFO for writing or until all read-only file descriptors for the FIFO are closed. This event and POLLOUT are mutually-exclusive; a stream can never be writable if a hangup has occurred. However, this event and POLLIN, POLLRDNORM, POLLRDBAND, or POLLPRI are not mutually-exclusive. This flag is only valid in the revents bitmask; it shall be ignored in the events member.

I would be very interested if someone from the *BSD community would explain the history behind what appears to me to be a direct violation of these core Unix specifications, which have not changed how POLLHUP should be handled over the last (at least) ~20 years.

#15 Updated by gstrauss 8 months ago

  • Related to Bug #2688: [PATCH] LightyTest.pm: do not shutdown on darwin added

#16 Updated by gstrauss 8 months ago

Having posted all that specific to *BSD/Darwin, @flynn reports above that his system is Linux. I haven't been able to reproduce this on Linux (using Linux kernel 4.6.4 on Fedora 24 and a small test-case I wrote). A further patch to lighttpd may be advisable, depending on whether the problem is tracked down to Python's Twisted framework or if others see similar issues in other frameworks. In @flynn's case, I am under the impression that Linux is not sending POLLHUP, but Twisted is reading end-of-stream (0 bytes for recv()/recvfrom()) and is treating that as a fully-closed socket instead of checking to see if the socket is half-closed.

#17 Updated by gstrauss 8 months ago

I pushed the second patch to personal/gstrauss/master branch at
git clone https://git.lighttpd.net/lighttpd/lighttpd1.4.git

I'd like some feedback from some folks on *BSD platforms before this patch gets pushed to master. Thanks.

(Note: my dev branch might be rewritten without warning)

#18 Updated by gstrauss 8 months ago

4bc06bfc demonstrates how to check for half-closed socket on some popular OSes, after recv() of 0 bytes on AF_INET or AF_INET6 socket.

@flynn: have you had a chance to dig deeper into buildbot or twisted to track down where the one or the other detects end-of-stream and treats it as a socket error?

#19 Updated by stbuehler 8 months ago

gstrauss wrote:

Some excerpts from the following appear to indicate the opposite of what you are saying
https://www.safaribooksonline.com/library/view/http-the-definitive/1565925092/ch04s07.html

TCP close and reset errors

Simple HTTP applications can use only full closes. But when
applications start talking to many other types of HTTP clients,
servers, and proxies, and when they start using pipelined persistent
connections, it becomes important for them to use half closes to
prevent peers from getting unexpected write errors.

In general, closing the output channel of your connection is always
safe. The peer on the other side of the connection will be notified
that you closed the connection by getting an end-of-stream
notification once all the data has been read from its buffer.

Don't see how that section is related to anything apart from
recommmending a graceful close so the other end doesn't see hard errors
(aka RST, which it still needs to be able to handle.)

[...]

Graceful close

The HTTP specification counsels that when clients or servers want to
close a connection unexpectedly, they should "issue a graceful close
on the transport connection," but it doesn't describe how to do
that.

This clearly supports my point; if you want to abort ("close
unexpectedly") close the connection (and again the recommendation to do
it gracefully). While it is not written that way, imho this clearly
indicates an unexpected close (graceful or not) is to be considered a
connection abort.

In general, applications implementing graceful closes will first
close their output channels and then wait for the peer on the other
side of the connection to close its output channels. When both sides
are done telling each other they won't be sending any more data
(i.e., closing output channels), the connection can be closed fully,
with no risk of reset.

This just describes how to avoid TCP RST packets.

Unfortunately, there is no guarantee that the peer implements or
checks for half closes. For this reason, applications wanting to
close gracefully should half close their output channels and
periodically check the status of their input channels (looking for
data or for the end of the stream). If the input channel isn't
closed by the peer within some timeout period, the application may
force connection close to save resources.

Don't wait too long for the other end. Nothing interesting here.

All that said, here's a link to a discussion from 2001, which is much
older than the book above:

https://lists.w3.org/Archives/Public/ietf-http-wg-old/2001JanApr/0037.html

This guy clearly states that he assumes most implementations are of type
"2" (send FIN after receiving respoe) and "B" (treat FIN as abort),
which is exactly what I described.

It would be somewhat safe to impement "A" (ignore FIN on server), but as
I alraedy said this makes it impossible to implement long polls - all
your requests need to be handled quickly, otherwise you'll end up with a
lot of hanging connections. (Long poll requests must not be combined
with pipelining ofc, but afaik no client does pipelining by default).

I think lighttpd 1.4.x is probably of type "A" right now, but lighttpd 2
is of type "B"
(https://git.lighttpd.net/lighttpd/lighttpd2.git/tree/src/main/connection.c#n205).

https://lists.w3.org/Archives/Public/ietf-http-wg-old/2001JanApr/0039.html

At least that guy supports your theory. Doesn't have any arguments
though. As I said before, this interpretation takes away the possibility
for a client to abort a request / "really close the connection".

That last link above is from one of the authors of the HTTP spec, Roy
Fielding.

"proof by authority" or "ignore this crazy idiot who designed this
broken shit"...

https://tools.ietf.org/html/rfc7230#section-6.6 Tear-down recommends
that the server tear down connections in multiple steps, starting with
a half-close, but the spec does not say anything about whether the
client can send half-close. See Roy Fielding's note in the ietf-http-
wg message above for his suggestion why the spec should not specify
such details, since HTTP protocol can be sent over many different
transports, of which TCP is one possible transport.

It's a lame excuse not to want to care about important details, and
makes his opinion even less important.

You'd actually need to clearly specify what you want from the
transport. E.g. you could say:
  • Transport needs semantics for connection close (not half-close)
  • Transport optionally can additionally provide semantics for half-
    close

Then you'd define how you want to handle these in context of HTTP; and
for each transport you want to run HTTP over you'd need to define how to
map them.

As TCP is the official way to run HTTP on the standard should have
specified how to map semantics; but as TCP can only provide one of the
requirements the mapping would be obvious.

The recommendation to do graceful TCP closes belongs only in the TCP
part, because it has nothing to do with HTTP.

Btw: HTTP/2 does it exactly that way:

RFC 7540: 5.4.1 Connection Error Handling (alternative link, ietf seems slow today)

An endpoint that encounters a connection error SHOULD first send a
GOAWAY frame (Section 6.8) [...]. After sending the GOAWAY frame for
an error condition, the endpoint MUST close the TCP connection.

Nothing about "half-close" anymore, a TCP connection is only open or
closed as seen by HTTP.

RFC 7230: 6.5. Failures and Timeouts

[...]
A client or server that wishes to time out SHOULD issue a graceful
close on the connection. Implementations SHOULD constantly monitor
open connections for a received closure signal and respond to it as
appropriate, since prompt closure of both sides of a connection
enables allocated system resources to be reclaimed.
[...]

"appropriate" could mean a lot of things of course; but a peer that
"timed out" a connection is no longer expecting a useful response, and
the part about the system resources clearly indicates you should simply
cancel the request and close the connection.

If you received the full request body it is imho a good idea (but not a
hard requirement) to forward that request to the backend, and then
indicate to the backend in a similar way (for HTTP: graceful close) that
you are not interested in a response.

RFC 7230: 6.6. Tear-down

[...]
A client that sends a "close" connection option MUST NOT send further
requests on that connection (after the one containing "close") and
MUST close the connection after reading the final response message
corresponding to this request.
[...]

Stated very clearly: close after reading the response.

Summary:
  • `mod_proxy` has always been of type "2", you broke it.
  • TCP half-close is the only way a client has to reliably indicate
    connection abort, and all major clients use it exactly that way. RFC
    7230 clearly supports this interpretation, whatever the original HTTP
    author thought in 2001.

Please revert your change now.

#20 Updated by flynn 8 months ago

I took a look at haproxy, how socket closing is handled.

haproxy uses shutdown(..., SHUT_WR) only on pure TCP-connections, never on http-connections.

I verified this in the code and live with strace (i have a bigger installation of haproxy running ...)

There must be a reason, why haproxy does not use shutdown, and either lighttpd takes this as a hint
or the author of haproxy should be asked. I think he is a person with very deep knowledge and experience
in http proxying.

I also want to suggest a compromise:
  • keep shutdown(..., SHUT_WR) as default
  • add an option in mod_proxy, .e.g. proxy.broken_backend = 0/1 to omit the call to shutdown

P.S.: I did not have time to look deeper at python/twisted, but I will do it ...

#21 Updated by gstrauss 8 months ago

@stbuehler wrote:

  • TCP half-close is the only way a client has to reliably indicate
    connection abort

Please keep in mind that the HTTP spec does not say this.
Also, you previously criticized the HTTP spec, suggesting that
the HTTP spec should be more explicit about TCP behavior.
(I am interpreting your comments in the context of HTTP.)

The HTTP spec recommends a graceful close when a mechanism is available.
When HTTP uses TCP transport, a TCP socket half-close is available to
indicate a graceful close.

A graceful close is not necessarily the same things as connection abort.
I think you and I have different ideas about what encompasses a
"connection abort". I consider "normal connection close" different from
"connection abort". Also, I do not consider an "unexpected close"
necessarily always equivalent to "connection abort".

  • TCP half-close is the only way a client has to reliably indicate
    connection abort

Connection abort is very reliably ("portably") implemented in TCP by
sending TCP RST, although TCP RST is not graceful.

  struct linger li = { 1, 0 }; /* { .l_onoff = 1, .l_linger = 0 } */
  setsockopt(fd, SOL_SOCKET, SO_LINGER, &li, sizeof(li));
  close(fd);

Similarly, shutdown(fd, SHUT_RD) is not graceful, but indicates abort
the next time (or maybe the time after that) the remote tries to write.
On pipes, closing the read side (so write by the remote gets EPIPE)
indicates abort, but is not graceful.

.

@stbuehler: I disagree with your use of the word "clearly" in your
arguments, conclusions, and tl;dr, but it is clearly not worth my effort
to try to convince you otherwise.

.

I am planning to remove the shutdown() calls in mod_proxy, mod_scgi
for practical reasons. Due to the POLLHUP behavior triggered on
*BSD/Darwin, the shutdown() has been limited to local connections.
If interested in squeezing the last bits of performance out of a
machine, an admin should configure local connections to be AF_UNIX
instead of AF_INET or AF_INET6 to localhost. The reason I originally
added the shutdown() in mod_proxy and mod_scgi was to aggressively
reduce the number of potential sockets in TIME_WAIT held by lighttpd.
(See 923688d2 "drain backend socket/pipe bufs upon FDEVENT_HUP",
done for reliability given the aforementioned *BSD/Darwin behavior.)
When using AF_UNIX, the TIME_WAIT issue does not exist, ergo, the
recommendation is to use AF_UNIX for local sockets, when available.
Using AF_UNIX sockets is a better solution to eliminate TIME_WAIT
than is TCP shutdown() half-close which, as we have seen, might not
be handled well by frameworks which are more complex than basic read
request, send response, and close.

#22 Updated by gstrauss 8 months ago

  • Status changed from Patch Pending to Fixed
  • % Done changed from 0 to 100

#23 Updated by stbuehler 7 months ago

  • Duplicated by Bug #2751: mod_proxy closes connection added

#24 Updated by brad@comstyle.com 6 months ago

The commit 4bc06bfc0b72b92c67f21ac0dc7de7d67be73728 breaks building on OpenBSD.

fdevent.c: In function 'fdevent_is_tcp_half_closed':
fdevent.c:305: error: storage size of 'tcpi' isn't known
fdevent.c:307: error: 'TCP_INFO' undeclared (first use in this function)
fdevent.c:307: error: (Each undeclared identifier is reported only once
fdevent.c:307: error: for each function it appears in.)
fdevent.c:305: warning: unused variable 'tcpi'

OpenBSD has TCPS_CLOSE_WAIT but not TCP_INFO.

#25 Updated by gstrauss 6 months ago

Brad: please see subsequent 6ec66c4d from 20 Aug which fixed that issue on OpenBSD and DragonflyBSD

#26 Updated by brad@comstyle.com 6 months ago

Oh, thanks. I had been looking through all of the commits to HEAD but skipped that one as I thought it was only DragonFly related changes. The diff with the relevant bit is similar to what I had come up with locally to make the code build.

Also available in: Atom