Project

General

Profile

Bug #2907

mod_proxy delays subsequent requests up to 5 secs

Added by MindJuice 2 months ago. Updated about 2 months ago.

Status:
Invalid
Priority:
Low
Assignee:
-
Category:
mod_proxy
Target version:
-
Start date:
2018-09-05
Due date:
% Done:

0%

Estimated time:
Missing in 1.5.x:

Description

I have configured lighttpd 1.4.48 to proxy my /api endpoints to a local server running on port 8080 as shown below.

# Forward all /api requests to the apiserver
$HTTP["url"] =~ "^/api" {
  proxy.server = (
    "/api" => (
      (
        "host" => "127.0.0.1",
        "port" => "8080" 
      )
    )
  )
}

I have a web app that polls the same API endpoint every 5 seconds. The issue I am seeing is that the first time, it takes about 600ms to return to the browser. The second time 1.5s. Then 2.7s, then 3.5s, then 4.7s, then 6.1s, then back to about 600ms. This cycle repeats over and over. Essentially, each request takes an extra ~1.1s until it goes over 5 seconds, then it resets.

From the logging on my api server, I can see that the request is not received immediately. For some reason it is not being immediately forwarded by lighttpd.

I found some info on delays that suggested using the following:

server.stream-request-body = 1
server.stream-response-body = 1

With the above values set to 0, every request was consistently delayed by 5 seconds. With it set to 1 or 2, I get the increasing/cycling delays I described above.

Does anyone else have a similar setup that works properly and could share their config? Does this seem like a misconfiguration on my part or possibly a bug?

Any help would be most appreciated!

Thanks!

lighttpd.conf.tar.gz (89 KB) lighttpd.conf.tar.gz MindJuice, 2018-09-17 17:06
curl-debug-output.txt (4.65 KB) curl-debug-output.txt MindJuice, 2018-09-18 18:43
browser-debug-output.txt (14.7 KB) browser-debug-output.txt MindJuice, 2018-09-18 18:43
lighttpd-log2.txt (17.1 KB) lighttpd-log2.txt MindJuice, 2018-09-25 18:18
lighttpd-log.txt (14.3 KB) lighttpd-log.txt MindJuice, 2018-09-25 18:18

History

#1

Updated by gstrauss 2 months ago

Have you looked to see if this is a problem with your backend? Please try writing a simple "Hello World" style backend, test with lighttpd 1.4.50, and see if you can still reproduce the issue.

#2

Updated by MindJuice 2 months ago

Hi, thanks for the reply.

I should have also mentioned that if I send a request directly to the backend server on port 8080, the backend server responds instantly. If I send the request to port 80, and it gets proxied to port 8080, then it has the delay.

Updating to 1.4.50 would be a bit of work, as I'm using a buildroot Linux and would need to rebuild everything, which takes 12 hours on my crazy slow Linux VM. Is there some recent fix you think might be related?

#3

Updated by gstrauss 2 months ago

The config snippet that you have presented is fairly simple, and you're the only one who has reported this. (You have not shared your entire config.) As things are working fine for me on an x86 Linux box, these all tend towards something amiss in your setup. Are you running a modern kernel? Have you looked in the lighttpd error log to see if there is a reason for the 5 second delay? Have you tried turning on additional debugging in the lighttpd config? DebugVariables Are you sure that lighttpd is receiving the web requests in a timely fashion and then delaying before sending to backend? Have you tried basic values:

server.event-handler = "poll" 
server.network-backend = "writev" 

You should try commenting out (disabling) server.defer-accept or server.bsd-accept-filter if you have enabled either.

It does not sound like you wrote a simple "Hello World" program (separate from your API) and tried that. Please try that.
You might also try strace on both lighttpd and your app to see what is being sent and received. strace -ttt to get timestamps along with the syscalls.

#4

Updated by MindJuice 2 months ago

My kernel version is below. It's a custom buildroot for an embedded system:

# uname -r
4.9.87-linux4sam_5.8

I just took the default config from here (https://github.com/lighttpd/lighttpd1.4/tree/master/doc/config), and modified some paths to match my system and added the proxy config shown above (making sure to include proxy.conf). I attached the full set of config files in a .tar.gz.

Unfortunately the system has no strace on it, so I can't try that.

I didn't have server.defer-accept or server.bsd-accept-filter enabled.

I swapped in the different values for server.network-backend with no noticeable difference.

I also turned on logging to syslog, which shows a lot more info than what was going to the actual lighttpd log files before. What I see is the request is sent by the browser, but nothing shows up in the syslog immediately. Then, after a few seconds, I see the following output, and the request goes on to be handled as expected:

(gw_backend.c.308) released proc: pid: 0 socket: tcp:127.0.0.1:8080 load: 1
... elided condition evaluations
(gw_backend.c.933) gw - found a host 127.0.0.1 8080
(gw_backend.c.2405) handling it in mod_gw

It seems that there is some backend process that is being kept alive, and a new request cannot be started until the last one is complete. No idea yet why that is the case.

I'll try to make a simple hello world server later today or tomorrow, but wanted to share the above in case it helps shed some light.

#5

Updated by gstrauss 2 months ago

Too bad that you do not have strace.

However, now that you've turned on the debug flags, look for some more messages from gw_backend.c. Does it think that the backend is unresponsive and has left it alone for a couple seconds before sending the backend more requests? Does your backend set a listen() backlog of 1? (not recommended)

#6

Updated by MindJuice 2 months ago

I just tried a very simplified version of my API server that only responds to a single route with a hard-coded "Hello!" string.

Curling that URL returns the response immediately, whether I do it through the proxy (port 80 to 8080) or directly to port 8080.

If I try to load the same URL through a browser (tried Chrome and Safari on Mac) with the simple "hello" server running, then I see the delay again. It's consistently around 5 seconds.

Seeing this, I switched back to my full API server and curled the same URL. There is no delay through lighttp in that case either. It's only when requesting through the browser for some reason.

BAD: Refreshing the following URL in the browser repeatedly results in consistent times around 5 seconds to fetch the response:

http://192.168.1.35/api/status/dashboard

GOOD: Refreshing the following URL in the browser repeatedly results in consistent times under 20ms to fetch the response (bypasses lighttp proxy):

http://192.168.1.35:8080/api/status/dashboard

The main difference I see in the log output between curling and the browser (both to port 80) is that the browser requests the favicon immediately after the main URL is requested. I've attached the output in two separate files. I added some blank lines in browser-debug-output.txt to show where the different sections are between the two requests. In the browser version, you can see a line near the end that says:

Jan  1 00:44:30 system daemon.err lighttpd[182]: (server.c.1809) connection closed - keep-alive timeout: 10

That line comes 5 seconds after the response is received from the API server. There is a second line like that a further 5 seconds later. Those two lines don't appear when requesting the URL via curl.

Regarding the backlog setting, it looks like it's set to 1024: https://github.com/lighttpd/lighttpd1.4/blob/82dcb34c730aabb2db32ad432a8ca45cc8c79846/src/gw_backend.c#L1300

Looking into server.c further now.

#7

Updated by gstrauss about 2 months ago

Please try using curl with two URLs on the command line to simulate what the browser is doing with the pipelined HTTP/1.1 request:

curl http://192.168.1.35/api/status/dashboard http://192.168.1.35/favicon.ico

If you enable mod_accesslog, what is the timestamp on lighttpd's response to the browser's second request for favicon.ico?

#8

Updated by MindJuice about 2 months ago

I've attached a couple of log files.

lighttpd-log.txt contains the logs for curl http://192.168.1.35/api/status/dashboard http://192.168.1.35/favicon.ico

lighttpd-log2.txt contains similar logs when the dashboard is in the steady state of making the /api/status/dashboard request every 5 seconds.

mod_access output is included in the logs.

It looks like the processing of the request for favicon.ico is not started until 5 seconds after the response to the /api/status/dashboard was sent.

#9

Updated by gstrauss about 2 months ago

Looks like your Crow/0.1 server is not closing the connection for 5 seconds after the response starts, even though lighttpd sends "Connection: close" in the request to the backend

#10

Updated by MindJuice about 2 months ago

Oh, interesting. Thank you. I will look into that and report back.

#11

Updated by gstrauss about 2 months ago

  • Status changed from New to Invalid
  • Priority changed from Normal to Low
  • Target version deleted (1.4.x)

If sending a pipelined request directly to your backend exhibits the same behavior, the problem is definitely with your backend.
curl http://192.168.1.35:8080/api/status/dashboard http://192.168.1.35:8080/favicon.ico
If your backend does not support request for favicon.ico, just make two pipelined requests for api/status/dashboard
If your backend does not support pipelined requests, then strace -ttt your backend and look when it closes the socket after servicing the request.

Please do more testing before considering opening bugs, and instead of opening bugs, please post questions (like this ticket) in the forums. (See Forums tab in heading menubar)

Also available in: Atom