Problems with scaling

Added by argyleblanket over 16 years ago

I've got a bunch of servers running lighttpd 1.4.20 and I'm trying to maximise throughput on the network interface. On one class of server, this is not an issue. I can push it up to 100Mb/s without an issue. On the other class of server, which has the same configuration, I can't get past 80Mb/s (95percentile). It will push up to and slightly over 80Mb/s for a short period of time, then drop significantly below (30Mb/s), as if in response to being pushed too far. Here are the particulars:

OS: FC6
Hardware (working class of servers): Intel P4 3 GHz, 1GB RAM
Hardware (problem class of servers): Dual quad-core Intel Xeon 1.6 GHz, 3GB RAM (a later purchase, hence the higher spec.)
lighttpd version: 1.4.20 (started on 1.4.18 and upgraded in order to attempt to resolve, but no luck).

Servers are serving a mix of large video files and smaller thumbnails. The majority of the requests are for thumbnails (probably a 40:1 ratio).

Any help would be greatly appreciated. I've got a total 80Mb/s I can't use because of this, which ain't cheap.

Output of vmstat on a working server:

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  1     60  25160  14092 590648    0    0    87    34    1    2  9 19 45 28   0
 0  1     60  33132  14228 585256    0    0 10171   882 10642  229  1 25 10 64  0
 0  1     60  11032  13928 608972    0    0 10971    94 9432  170  1 22  3 74   0
 0  1     60  12304  13268 606416    0    0 10831     2 10516  152  1 24  2 73  0

Output of vmstat on a problem server:

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  3     68 125468  37164 2972584    0    0    15     3    0    0  5  1 87  8  0
 1  1     68 126848  36116 2969212    0    0  8802   317 9784  212  0  1 87 12  0
 0  3     68 125456  35856 2978272    0    0  5914  1379 6716  201  0  0 87 12  0
 0  3     68 125060  35732 2978272    0    0  9055   279 8295  217  0  1 87 12  0

Config follows (paths, mod_secdownload specifics and mimetypes left out for the sake of security and brevity):

server.modules = ( "mod_access", "mod_status", "mod_setenv", "mod_secdownload", "mod_accesslog" )

server.max-keep-alive-requests = 4
server.max-keep-alive-idle = 4
server.event-handler = "linux-sysepoll"
server.max-fds = 2048
server.stat-cache-engine = "fam"
server.follow-symlink = "enable"
server.network-backend = "writev"

Thanks for your help.

Replies (6)

RE: Problems with scaling - Added by darix over 16 years ago

first of all a general note: upgrade your OS. FC6 is out of maintenance.
2nd tried sendfile?

RE: Problems with scaling - Added by argyleblanket over 16 years ago

Tried sendfile. No luck, and doesn't explain why the servers with a lower spec, but the same OS, are able to scale but the ones I'm having trouble with aren't.

RE: Problems with scaling - Added by jan over 16 years ago

Please add the output of:

$ iostat -x 1

That will give us some ideas which disks are waiting so long. Are any of the requests against a remote file-system ? The idle + io-wait makes up all the time.

RE: Problems with scaling - Added by argyleblanket over 16 years ago

Samples from a "good" server that scales well (goodiostat) and a "bad" server that doesn't (badiostat) attached. There are no requests against remote file systems.

Download all files

badiostat.txt (4.5 KB) badiostat.txt	Output of iostat from problem server.
goodiostat.txt (8.3 KB) goodiostat.txt	Output of iostat from working server.

RE: Problems with scaling - Added by jan over 16 years ago

We are all fine, util% is 100% for both disk.

But on the "badiostats" the rrqm/s (read requests per second) is only half the r/s (reads per second). Usually you have more requests than actual reads as reads can be group into bundles and handled in one read.

But in your bad case, it is the other way around that makes me assume that either you have bad-blocks (and hence have to read blocks twice) or you do something else that triggers extra physical reads on that disk.

Jan

RE: Problems with scaling - Added by argyleblanket over 16 years ago

Talked this through with the ISP and eventually upgraded the kernel from 2.6.18 to 2.6.22. iostat still looks like it has problems, but bandwidth usage has improved dramatically. No idea why.

(1-6/6)

Project

General

Profile

Lighttpd