High system load
Added by dutsmiller about 16 years ago
I have a fairly busy Lighttpd farm (500 requests/sec per server) running on Debian Lenny. What I'm noticing is that after several days of normal operation, the load on the servers keeps creeping up to the point where cpu use is approaching 100%. Interestingly enough, the high cpu use is reported as %sys as opposed to %user.
- mpstat 1 3
Linux 2.6.26-1-amd64 04/20/2009 x86_64
12:16:55 PM CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s
12:16:56 PM all 6.06 0.00 81.82 5.05 1.01 2.02 0.00 4.04 3447.47
12:16:57 PM all 5.00 0.00 82.00 2.00 1.00 3.00 0.00 7.00 3081.00
12:16:58 PM all 3.03 0.00 74.75 12.12 0.00 3.03 0.00 7.07 3468.69
I have tried removing the load-balanced traffic from the server which results in the cpu use and load going right down to zero and the server-status page shows zero open connections. However, when I move the traffic back onto the server, the load and cpu use goes immediately back up. The only thing that works is to restart lighttpd which results in the load dropping right back down to a normal level appropriate with the traffic going through the server (<25%). The server load will then steadily build over a matter of days until I am forced to restart the process once again. I am not sure if there is a setting in linux that I need to tweak or if it's a lighttpd issue. I have tried lighttpd 1.4.20, 1.4.22 and the latest 1.5 cvs build.
The servers are only used to serve out static files with no cgi/php support at all.
The options I have set in lighttpd.conf are:
server.modules = (
"mod_access",
"mod_accesslog",
"mod_redirect",
"mod_rewrite",
"mod_status",
"mod_compress",
)
server.max-keep-alive-requests = 0
server.max-fds = 100000
And the only sysctl.conf options are:
net.ipv4.tcp_fin_timeout = 1
net.ipv4.tcp_tw_recycle = 1
net.core.rmem_max = 16777216
net.core.rmem_default = 16777216
net.core.netdev_max_backlog = 262144
net.core.somaxconn = 262144
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_orphans = 262144
net.ipv4.tcp_max_syn_backlog = 262144
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_syn_retries = 2
Any suggestions are welcomed.
-Tim
Replies (5)
RE: High system load - Added by dutsmiller about 16 years ago
I have done some more troubleshooting and it seems to be an issue with Debian Lenny. I installed Debian Etch on one of the servers and with the exact same configuration/tweaks, the load is considerably lower and does not increase over time. I don't have any idea what could be causing the issues on Debian Lenny, but I'll post back here if I figure it out.
RE: High system load - Added by lix about 16 years ago
Do you know dstat?
http://dag.wieers.com/home-made/dstat/
You could try it and sho the output.
dstat -cgilpymn
Also my current configuration of sysctl.conf:
net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
kernel.sysrq = 0
kernel.core_uses_pid = 1
net.ipv4.tcp_syncookies = 0
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.shmmax = 68719476736
kernel.shmall = 4294967296
####################################################
net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.core.optmem_max=16777216
net.core.netdev_max_backlog=2500
net.ipv4.tcp_max_tw_buckets=131072
net.ipv4.tcp_rmem='4096 65536 16777216'
net.ipv4.tcp_wmem='4096 87380 16777216'
net.ipv4.tcp_fin_timeout=5
net.ipv4.tcp_tw_recycle=1
net.ipv4.tcp_timestamps=0
RE: High system load - Added by dutsmiller about 16 years ago
Here is what I came up with from the two servers. Again, these are running identical hardware with the only difference being the operating system. As you can see below, the sys% is far higher on Lenny and that is the part that will continue to rise over a matter of days. Oddly enough, I have Apache and Mysql servers running on Lenny with no problems whatsoever, so it's some mixture of Lenny and Lighttpd that is causing increased sys load.
Debian Etch
----total-cpu-usage---- ---paging-- interrupts ---load-avg-- ---procs--- ---system-- ------memory-usage----- net/total
usr sys idl wai hiq siq| in out | 217 233 | 1m 5m 15m |run blk new| int csw | used buff cach free| recv send
2 4 87 5 0 3| 0 0 | 2 4587 | 0.1 0.1 0.1| 0 0 0|6066 1525 | 167M 9328k 1815M 21M| 0 0
2 9 67 20 0 3| 0 0 | 0 6822 | 0.1 0.1 0.1| 0 0 0|8631 2499 | 167M 9328k 1815M 21M| 665k 2537k
4 6 66 21 0 3| 0 0 | 4 6258 | 0.1 0.1 0.1| 0 1 0|8519 2203 | 167M 9328k 1817M 19M|2587k 2399k
2 6 61 28 1 2| 0 0 | 0 5373 | 0.1 0.1 0.1| 0 0 0|7537 1640 | 167M 9328k 1819M 17M| 489k 2576k
2 7 70 19 0 2| 0 0 | 5 5446 | 0.1 0.1 0.1| 0 1 0|7649 1990 | 167M 9344k 1819M 17M|2434k 2014k
1 4 75 18 0 2| 0 0 | 2 4781 | 0.1 0.1 0.1| 0 0 0|6622 1812 | 167M 9352k 1820M 17M| 428k 2422k
0 3 81 15 0 1| 0 0 | 0 2933 | 0.1 0.1 0.1| 0 0 0|4144 1011 | 167M 9352k 1820M 17M| 355k 1598k
1 4 90 4 0 1| 0 0 | 0 2737 | 0.1 0.1 0.1| 0 0 0|3684 964 | 167M 9352k 1820M 17M| 190k 1049k
2 8 76 9 0 5| 0 0 | 0 7922 | 0.1 0.1 0.1| 0 0 0| 10k 2798 | 167M 9352k 1820M 17M| 224k 1595k
Debian Lenny
----total-cpu-usage---- ---paging-- interrupts ---load-avg-- ---procs--- ---system-- ------memory-usage----- net/total
usr sys idl wai hiq siq| in out | 19 21 | 1m 5m 15m |run blk new| int csw | used buff cach free| recv send
2 16 71 8 1 2| 0 0 | 673 2 | 0.3 0.4 0.4| 0 0 0|3054 1467 | 155M 8392k 1833M 19M| 0 0
4 14 58 20 2 2| 0 0 |1177 0 | 0.3 0.4 0.4| 1 0 0|4362 2057 | 155M 8392k 1834M 18M| 412k 2076k
3 24 46 23 1 2| 0 0 |1193 5 | 0.3 0.4 0.4| 1 1 0|4681 2110 | 155M 8408k 1835M 17M|1491k 2775k
8 20 59 11 1 2| 0 0 |1085 0 | 0.3 0.4 0.4| 1 0 0|4876 1978 | 155M 8408k 1835M 17M|1580k 3654k
4 14 75 4 0 3| 0 0 | 667 0 | 0.3 0.4 0.4| 0 0 0|3182 1626 | 155M 8408k 1835M 17M| 372k 2327k
3 14 69 12 0 2| 0 0 | 940 4 | 0.3 0.4 0.4| 1 0 0|3367 1523 | 155M 8416k 1837M 15M|1273k 2218k
1 15 71 10 1 2| 0 0 | 781 0 | 0.3 0.4 0.4| 1 0 0|3447 1461 | 155M 8376k 1835M 17M|1392k 2105k
4 13 74 4 1 4| 0 0 | 584 2 | 0.3 0.4 0.4| 0 0 0|3014 1245 | 154M 8384k 1836M 16M|1294k 2203k
RE: High system load - Added by icy about 16 years ago
High %sys might indicate that lighty is doing a lot of syscalls. I know it's a fairly busy site but would it be possible to get some strace logs (with timestamps) from both to compare?
RE: High system load - Added by dutsmiller about 16 years ago
I'd be glad to. The only concern I have is that it's a problem that gets progressively worse so I'm not sure if it'll be immediately visible. For example, if I restart lighty to start it with strace, the load is going to immediately drop and could take a day or so to get back up to where it was. I could maybe try to force the issue by directing more traffic to it with the load balancer, but I'm not sure if that'll do the trick or not. Regardless, it's worth a shot.
I haven't used strace in years so if you could give me the command you'd like me to run I'll post a link to the results.