Bug #673

Connection error on Solaris

Added by Anonymous over 8 years ago. Updated about 6 years ago.

Status:FixedStart date:
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:core
Target version:1.4.21
Missing in 1.5.x:

Description

As described in the forum, but gained no response.

I have lighttpd-1.4.11 compiled on solaris 10 using gcc 3.3.2. I am serving nothing but static files and do not have anything but mod_status and mod_accesslog enabled.

I start lighttpd and some connections get made, others, including a heartbeat request, block and
the following error is dumped in the error log:


2006-06-05 16:32:18: (connections.c.222) unexpected end-of-file: 10

This server is used in a very high traffic situation and I'm looking to replace Apache in order
to get more simultaneous clients. I have added the following in the config:


server.max-fds = 8192
server.max-keep-alive-requests = 5000
server.max-keep-alive-idle = 90

as a config similar to this on a Linux box works very well for me.

I think lighttpd is fantastic, but this is a major blocker for me when trying to get it
working on Solaris.

Regards
Stephen

-- lighttpd


Related issues

Duplicated by Bug #1990: Solaris ioctl +"unexpected end of file" Fixed 2009-05-25

History

#1 Updated by Anonymous almost 8 years ago

ioctl may return any negative value as an error. I change the line to look for a value less than 0. Granted it hasn't been long since I've made the change, but I've not seen this error log message appear yet.


Line 221:
  if (ioctl(con->fd, FIONREAD, &toread) < 0) {

-- joe

#2 Updated by Anonymous almost 8 years ago

The error message appears less now. When it does occur, it's logging it as remote host drop connection or broken pipe. I assume this is because the browser disappeared, but there could be yet a bug still involved.

-- joe

#3 Updated by Anonymous almost 8 years ago

Follow-up: The code change helped, but not much. However; after digging around SunSolve and OpenSolaris, I've turned up some information that says to set a setting in order to work around the problem. The links are below. The setting is:

ndd -set /dev/tcp tcp_co_min 1500

1500 = MTU of your network interface card.

http://sunsolve.sun.com/search/document.do?assetkey=1-1-4701102-1

http://bugs.opensolaris.org/bugdatabase/view_bug.do;jsessionid=2387e881a19c7affffffffdbf791ee9a8d6b1?bug_id=4789772

-- joe

#4 Updated by ingenthr over 7 years ago

After looking into this for a customer, I can say pretty confidently that the cause is not bug 4701102, as it was fixed back in 2003 and the changes for that fix are still in current Solaris/OpenSolaris code.

I checked with another engineer and have learned this may just be incorrect error handling with the stream when using the devpoll backend. In other words, with this ioctl(), it's entirely possible to get an error but still have the stream readable. The best fix would probably be to change the error handling to anticipate a possible failure of this ioctl() when using this type of socket. The failure of this ioctl() in this case is not an indictation of error.

#5 Updated by ingenthr over 7 years ago

One other note, this was investigated with 1.4.15, but I also looked at a couple of files in 1.4.11 and it doesn't appear the behavior in this area has changed at all.

#6 Updated by Anonymous over 7 years ago

Hello ingenthr. So you would recommend to just simply ignore the return value of ioctl() altogether?

-- joe

#7 Updated by ingenthr over 7 years ago

I believe so. After checking with another engineer to verify, we believe that ioctl() is not necessary with this nonblocking stream socket, and the error message therefore isn't required either. It can then fall through to the buffer code and the read.

In fact, it may not be necessary in the Linux epoll or poll cases either. This style check is normally not used with a nonblocking socket. That would remove a syscall in the critical path here. If whatever event mechanism (devpoll, epoll, poll()) says there's data there, it should be safe to do a read and check for errors from there. There could be something I'm not aware of on other implementations.

The one thing I'm certain of is that it is not related to bugid 4701102 or 4789772. The descriptions for those, and implementing the workaround, on a very busy system had no effect-- not to mention both have been closed and integrated for a couple of years. If it was that bug, the cause of which was notification propogating before data was available at the stream head, turning the tcp_co_min to the MTU (or higher) would mean you couldn't get in to that condition. It would, though, also have a negative effect on the performance-- so the workaround was more of a test to verify where the error was than it was a proper workaround. The fix was straightforward, and you can see it in the OpenSolaris code for tcp.c still to this day.

Do you still see those messages occasionally on your system as well? I would imagine you probably do, since we saw them even though the workaround was in place.

By the way, I'm matt dot ingenthron at sun dot com if you'd like to discuss directly and update the bug as needed.

#8 Updated by stbuehler about 6 years ago

  • Status changed from New to Fixed
  • Resolution set to fixed

Fixed in r2317

Also available in: Atom