Bug #1477
closedLighttpd kills Ubuntu network install / local mirror
Description
If you try to netboot / net install Ubuntu Gutsy (haven't tried any others) over the network and you host the packages using Lighttpd instead of Apache, the installation will fail. This error is in the logs.
-------------------------------------------------
Dec 2 11:58:19 in-target: After unpacking 1765MB of additional disk space will be used.
Dec 2 11:58:19 in-target: Get:1 http://192.168.1.144 gutsy/main libfuse2 2.7.0-1ubuntu5 121kB
.....
Dec 2 11:58:28 in-target: Get:96 http://192.168.1.144 gutsy/main xmag 1:1.0.1-0ubuntu2 19.1kB
Dec 2 11:58:28 in-target: E: Method http has died unexpectedly!
-------------------------------------------------
If I simply shutdown Lighttpd and try Apache, it works perfectly.
As you can see, that's 96 GETs in like 9 seconds. Lighttpd either can't handle it, or intentionally cuts us off :( I'm guessing we get cut off for some reason. Doesn't make any sense...
Only modules running are: mod_access, mod_alias, mod_accesslog, and mod_compress
I found these relevant errors in the error.log
2007-12-02 10:47:22: (network_linux_sendfile.c.171) sendfile failed: Input/output error 6
2007-12-02 10:47:22: (connections.c.603) connection closed: write failed on fd 6
Ignore the date/timestamps on those errors; they were copied from an earlier attempt so they don't match the first errors I posted obviously.
These errors couldn't be from anywhere else because the only reason I setup lighttpd was to host this install server.
-- felderado
Updated by admin almost 17 years ago
Do you have a network trace of what happens?
Why doesn't the client retry the request if it fails?
Updated by Anonymous almost 17 years ago
I don't have a network trace. I have recreated this from several machines and also virtual machines.
By setting the network backend to "writev" I was able to make it through the install process after it failed once and then I retried it again. I've tried both 1.4x and 1.5x (which had the gthread-aio) and it doesn't really matter -- it simply doesnt like Lighttpd and I don't understand why... There is nothing special going on here; it's just serving a bunch of binary files.
Updated by admin almost 17 years ago
I'm asking because there could also be a bug in the client you're using.
Updated by Anonymous almost 17 years ago
I can recreate it easily enough and get a packet dump of it for you tonight.
-- felderado
Updated by admin almost 17 years ago
Have you managed to create the trace already?
Updated by stbuehler almost 17 years ago
- Did the sendfile backend work for at least one request or failed everytime? Perhaps sendfile is not supported for your filesystem.
- What did the error log say for "writev" / "gthread-io" ?
- Traces are of course useful. Configs too.
Updated by Anonymous almost 17 years ago
I can confirm this. A stock Ubuntu 7.10 AMD64 Server install booting from the network using PXE and then installing from a local mirror running the a stock lighttpd install from Gutsy 7.10 repository (1.4.18-1ubuntu1) fails with the exact error the original poster mentioned.
I installed Apache prefork - also a standard install. The only change I made to apache was to increase MaxSpareServers to 30. It worked perfectly from the same doc_root first time.
With lighttpd it consistently failed at exactly the same point in the install. FYI this was running on a Dell 2950 with the built in NIC on a gigabit ethernet switch with both the mirror and machine being installed on the same LAN.
Lighttpd wrote nothing to the error log and there was nothing useful in the access log either. I also checked the system logs and nothing.
This is very troubling because we run lighttpd as a front-end reverse proxy in our production environment and it processes well over 150 requests per second. So I'm wondering if requests are quietly failing under high load.
This error is 100% reproducible in our data center in a racked environment with a Dell 1GB switch and Dell 2950 servers. I ran a similar config in our office - same OS (also 64 bit) and with lighttpd and it worked fine. The only difference was the mirror machine was not a server class machine but it was an AMD64 arch. The mirror machine was also on a 100 Megabit port with 100MB nic and the server was 1GB - so perhaps the load wasn't enough to trigger this problem.
If I have time and two spare 2950's I'll try to repro this and debug in more detail.
Mark.
-- mmaunder
Updated by stbuehler about 16 years ago
- Status changed from New to Fixed
- Resolution set to worksforme
I am sorry, but without more information we cannot help you.
Updated by stbuehler about 16 years ago
- Status changed from Fixed to Missing Feedback
Also available in: Atom