Bug #657
lighty vs. apt-get - problem with pipelining
| Status: | Fixed | Start: | ||
| Priority: | Normal | Due date: | ||
| Assigned to: | - | % Done: | 0% |
|
| Category: | core | |||
| Target version: | 1.4.19 | |||
Description
Symptoms: running an in-house debian mirror on lighty I've noticed that sometimes the connection is "reset by peer" while downloading packages. This happens fairly rarely (maybe 1% of the time, meaning 1 in 100 packages will produce the problem). Retrying helps. Also, adding
Acquire::http::Pipeline-Depth "0";
to the apt config - aka, disabling http/1.1 pipelining - seems to be a valid workaround.
However, the APT docs scream about the above as being only needed on non-standard-compliant platforms. Quoting from 'man apt.conf':
"One setting is provided to control the pipeline depth in cases where the
remote server is not RFC conforming or buggy (such as Squid 2.0.2)
Acquire::http::Pipeline-Depth can be a value from 0 to 5 indicating how
many outstanding requests APT should send. A value of zero MUST be specified if
the remote host does not properly linger on TCP con? nections - otherwise data
corruption will occur. Hosts which require this are in viola? tion of RFC 2068."
tcpdumps are available if anyone's interested.
cheers,
raas
-- raas
Associated revisions
fixed handling of EAGAIN in linux-sendfile (fixes #657)
History
Updated by moo over 3 years ago
there can be keep-alive limits in the server side, and the client never know when it reach the limit. the client should retry if the pipelined requests is failed. even the 1st request in the pipeline should be try 2 times, according to the http rfc
Updated by Anonymous over 2 years ago
We were also seeing this issue when using APT with lighttpd 1.4.13 over a fast LAN. I did some test runs with trace and tcpdumps, and found the following strace:
open("/srv/ubuntu/pool/main/p/parted/parted_1.7.1-2.1ubuntu3_i386.deb", O_RDONLY|O_LARGEFILE
) = 10
fcntl64(10, F_SETFD, FD_CLOEXEC) = 0
sendfile64(8, 10, [0], 55200) = -1 EAGAIN (Resource temporarily unavailable)
setsockopt(8, SOL_TCP, TCP_CORK, [0], 4) = 0
write(3, "66.230.200.243 apt.wikimedia.org - [06/Dec/2006:00:36:34 +0000] \"GET /ubuntu/pool
/main/p/parted/parted_1.7.1-2.1ubuntu3_i386.deb HTTP/1.1\" 200 0 \"-\" \"Ubuntu APT-HTTP/1.3
\"\n", 171) = 171
close(10) = 0
shutdown(8, 1 /* send */) = 0
lighty receives a EAGAIN on sendfile() (buffer full?) and then shuts down the sending to the socket and logs a 200 OK?!?
The following code in network_linux_sendfile.c seems responsible for this:
if (-1 == (r = sendfile(fd, c->file.fd, &offset, toSend))) {
switch (errno) {
case EAGAIN:
case EINTR:
r = 0;
break;
case EPIPE:
case ECONNRESET:
return -2;
default:
log_error_write(srv, __FILE__, __LINE__, "ssd",
"sendfile failed:", strerror(errno), fd);
return -1;
}
}
if (r == 0) {
/* We got an event to write but we wrote nothing
*
* - the file shrinked -> error
* - the remote side closed inbetween -> remote-close */
if (HANDLER_ERROR == stat_cache_get_entry(srv, con, c->file.name, &sce)) {
/* file is gone ? */
return -1;
}
if (offset > sce->st.st_size) {
/* file shrinked, close the connection */
return -1;
}
return -2;
}
If EAGAIN, r is set to 0. In the following code block, it's assumed that either the source file has shrunk (not the case) or the remote end must have closed the connection (not the case either). The latter seems strange to me - shouldn't ECONNRESET be returned for that?
EAGAIN likely means that some buffer is full, in which case this code returns -2 which makes lighty close down the connection.
I disabled the return -2, which seems to fix this issue. However, there seems to be another bug occurring with APT...
-- mark
Updated by Anonymous over 2 years ago
...and the other issue was simply a too low server.max-keep-alive-requests, which is not set to 128 by default as the manual used to say, but to 16.
lighttpd correctie closed the connection after 16 requests with Connection: close, but apparently APT doesn't handle that correctly.
-- mark
Updated by Anonymous about 1 year ago
Hi,
we have the same problems with debian.netcologne.de and deb.grml.org where I tried lighttpd. A fix would be really appreciated.
Thanks Alex
-- formorer
Updated by jan about 1 year ago
- Status changed from New to Fixed
- Resolution set to fixed
a patch for the EAGAIN was applied in r2072