Problems with scaling
Added by argyleblanket over 16 years ago
I've got a bunch of servers running lighttpd 1.4.20 and I'm trying to maximise throughput on the network interface. On one class of server, this is not an issue. I can push it up to 100Mb/s without an issue. On the other class of server, which has the same configuration, I can't get past 80Mb/s (95percentile). It will push up to and slightly over 80Mb/s for a short period of time, then drop significantly below (30Mb/s), as if in response to being pushed too far. Here are the particulars:
OS: FC6
Hardware (working class of servers): Intel P4 3 GHz, 1GB RAM
Hardware (problem class of servers): Dual quad-core Intel Xeon 1.6 GHz, 3GB RAM (a later purchase, hence the higher spec.)
lighttpd version: 1.4.20 (started on 1.4.18 and upgraded in order to attempt to resolve, but no luck).
Servers are serving a mix of large video files and smaller thumbnails. The majority of the requests are for thumbnails (probably a 40:1 ratio).
Any help would be greatly appreciated. I've got a total 80Mb/s I can't use because of this, which ain't cheap.
Output of vmstat on a working server:
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 0 1 60 25160 14092 590648 0 0 87 34 1 2 9 19 45 28 0 0 1 60 33132 14228 585256 0 0 10171 882 10642 229 1 25 10 64 0 0 1 60 11032 13928 608972 0 0 10971 94 9432 170 1 22 3 74 0 0 1 60 12304 13268 606416 0 0 10831 2 10516 152 1 24 2 73 0
Output of vmstat on a problem server:
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 0 3 68 125468 37164 2972584 0 0 15 3 0 0 5 1 87 8 0 1 1 68 126848 36116 2969212 0 0 8802 317 9784 212 0 1 87 12 0 0 3 68 125456 35856 2978272 0 0 5914 1379 6716 201 0 0 87 12 0 0 3 68 125060 35732 2978272 0 0 9055 279 8295 217 0 1 87 12 0
Config follows (paths, mod_secdownload specifics and mimetypes left out for the sake of security and brevity):
server.modules = ( "mod_access", "mod_status", "mod_setenv", "mod_secdownload", "mod_accesslog" )
server.max-keep-alive-requests = 4
server.max-keep-alive-idle = 4
server.event-handler = "linux-sysepoll"
server.max-fds = 2048
server.stat-cache-engine = "fam"
server.follow-symlink = "enable"
server.network-backend = "writev"
Thanks for your help.
Replies (6)
RE: Problems with scaling - Added by darix over 16 years ago
first of all a general note: upgrade your OS. FC6 is out of maintenance.
2nd tried sendfile?
RE: Problems with scaling - Added by argyleblanket over 16 years ago
Tried sendfile. No luck, and doesn't explain why the servers with a lower spec, but the same OS, are able to scale but the ones I'm having trouble with aren't.
RE: Problems with scaling - Added by jan over 16 years ago
Please add the output of:
$ iostat -x 1
That will give us some ideas which disks are waiting so long. Are any of the requests against a remote file-system ? The idle + io-wait makes up all the time.
RE: Problems with scaling - Added by argyleblanket over 16 years ago
Samples from a "good" server that scales well (goodiostat) and a "bad" server that doesn't (badiostat) attached. There are no requests against remote file systems.
badiostat.txt (4.5 KB) badiostat.txt | Output of iostat from problem server. | ||
goodiostat.txt (8.3 KB) goodiostat.txt | Output of iostat from working server. |
RE: Problems with scaling - Added by jan over 16 years ago
We are all fine, util% is 100% for both disk.
But on the "badiostats" the rrqm/s (read requests per second) is only half the r/s (reads per second). Usually you have more requests than actual reads as reads can be group into bundles and handled in one read.
But in your bad case, it is the other way around that makes me assume that either you have bad-blocks (and hence have to read blocks twice) or you do something else that triggers extra physical reads on that disk.
Jan
RE: Problems with scaling - Added by argyleblanket over 16 years ago
Talked this through with the ISP and eventually upgraded the kernel from 2.6.18 to 2.6.22. iostat still looks like it has problems, but bandwidth usage has improved dramatically. No idea why.