Bug #673: Connection error on Solaris - Lighttpd - lighty labs

Actions

Copy link

Bug #673

closed

Connection error on Solaris

Added by Anonymous almost 19 years ago. Updated over 16 years ago.

Status:

Fixed

Priority:

Normal

Category:

core

Target version:

1.4.21

ASK QUESTIONS IN Forums:

Description

As described in the forum, but gained no response.

I have lighttpd-1.4.11 compiled on solaris 10 using gcc 3.3.2. I am serving nothing but static files and do not have anything but mod_status and mod_accesslog enabled.

I start lighttpd and some connections get made, others, including a heartbeat request, block and
the following error is dumped in the error log:


2006-06-05 16:32:18: (connections.c.222) unexpected end-of-file: 10

This server is used in a very high traffic situation and I'm looking to replace Apache in order
to get more simultaneous clients. I have added the following in the config:


server.max-fds = 8192
server.max-keep-alive-requests = 5000
server.max-keep-alive-idle = 90

as a config similar to this on a Linux box works very well for me.

I think lighttpd is fantastic, but this is a major blocker for me when trying to get it
working on Solaris.

Regards
Stephen

-- lighttpd

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Anonymous over 18 years ago

ioctl may return any negative value as an error. I change the line to look for a value less than 0. Granted it hasn't been long since I've made the change, but I've not seen this error log message appear yet.


Line 221:
  if (ioctl(con->fd, FIONREAD, &toread) < 0) {

-- joe

Actions

Copy link

Updated by Anonymous over 18 years ago

The error message appears less now. When it does occur, it's logging it as remote host drop connection or broken pipe. I assume this is because the browser disappeared, but there could be yet a bug still involved.

-- joe

Actions

Copy link

Updated by Anonymous over 18 years ago

Follow-up: The code change helped, but not much. However; after digging around SunSolve and OpenSolaris, I've turned up some information that says to set a setting in order to work around the problem. The links are below. The setting is:

ndd -set /dev/tcp tcp_co_min 1500

1500 = MTU of your network interface card.

http://sunsolve.sun.com/search/document.do?assetkey=1-1-4701102-1

http://bugs.opensolaris.org/bugdatabase/view_bug.do;jsessionid=2387e881a19c7affffffffdbf791ee9a8d6b1?bug_id=4789772

-- joe

Actions

Copy link

Updated by ingenthr almost 18 years ago

After looking into this for a customer, I can say pretty confidently that the cause is not bug 4701102, as it was fixed back in 2003 and the changes for that fix are still in current Solaris/OpenSolaris code.

I checked with another engineer and have learned this may just be incorrect error handling with the stream when using the devpoll backend. In other words, with this ioctl(), it's entirely possible to get an error but still have the stream readable. The best fix would probably be to change the error handling to anticipate a possible failure of this ioctl() when using this type of socket. The failure of this ioctl() in this case is not an indictation of error.

Actions

Copy link

Updated by ingenthr almost 18 years ago

One other note, this was investigated with 1.4.15, but I also looked at a couple of files in 1.4.11 and it doesn't appear the behavior in this area has changed at all.

Actions

Copy link

Updated by Anonymous almost 18 years ago

Hello ingenthr. So you would recommend to just simply ignore the return value of ioctl() altogether?

-- joe

Actions

Copy link

Updated by ingenthr almost 18 years ago

I believe so. After checking with another engineer to verify, we believe that ioctl() is not necessary with this nonblocking stream socket, and the error message therefore isn't required either. It can then fall through to the buffer code and the read.

In fact, it may not be necessary in the Linux epoll or poll cases either. This style check is normally not used with a nonblocking socket. That would remove a syscall in the critical path here. If whatever event mechanism (devpoll, epoll, poll()) says there's data there, it should be safe to do a read and check for errors from there. There could be something I'm not aware of on other implementations.

The one thing I'm certain of is that it is not related to bugid 4701102 or 4789772. The descriptions for those, and implementing the workaround, on a very busy system had no effect-- not to mention both have been closed and integrated for a couple of years. If it was that bug, the cause of which was notification propogating before data was available at the stream head, turning the tcp_co_min to the MTU (or higher) would mean you couldn't get in to that condition. It would, though, also have a negative effect on the performance-- so the workaround was more of a test to verify where the error was than it was a proper workaround. The fix was straightforward, and you can see it in the OpenSolaris code for tcp.c still to this day.

Do you still see those messages occasionally on your system as well? I would imagine you probably do, since we saw them even though the workaround was in place.

By the way, I'm matt dot ingenthron at sun dot com if you'd like to discuss directly and update the bug as needed.

Actions

Copy link

Updated by stbuehler over 16 years ago

Status changed from New to Fixed
Resolution set to fixed

Fixed in r2317

Actions

Copy link

Also available in: Atom

Project

General

Profile

Lighttpd

Custom queries

Bug #673

Connection error on Solaris

Updated by Anonymous over 18 years ago

Updated by Anonymous over 18 years ago

Updated by Anonymous over 18 years ago

Updated by ingenthr almost 18 years ago

Updated by ingenthr almost 18 years ago

Updated by Anonymous almost 18 years ago

Updated by ingenthr almost 18 years ago

Updated by stbuehler over 16 years ago