Project

General

Profile

[UE] File download issues

Added by m.magliani@gmail.com over 3 years ago

We work with lighttpd 1.4.49 crosscompiled for an android device (X64 ARM architecture).
When we download files from the server it may happen that the download fails and stops after a while (sometimes immediately).
It doesn't occurs always but just sometimes, more frequently when we use https instead of http.
On the browser side, when the user click to the direct link of the downloadable file, the browser client may show the animated indicator with 0% (or just a little more) of downloading percentage and the only way to overcome the issue is to stop the blocked download and try again.
To show the problem we prepared a for cycle in linux command line to launch several wgets.
In the example below the failure is at the third attempt :

user@userpc01 ~/lighttpd-1.4.49 $ for i in $(seq 1 50); do wget https://10.134.76.39:8181/web/download/update.zip; sleep 2; done

--2019-02-12 16:47:49--  https://10.134.76.39:8181/web/download/update.zip

Connecting to 10.134.76.39:8181... connected.

HTTP request sent, awaiting response... 200 OK

Length: 14072517 (13M) [application/zip]

Saving to: ‘update.zip’

100%[===================================================================================================================================================================================================>] 14.072.517  54,3MB/s   in 0,2s

2019-02-12 16:47:49 (54,3 MB/s) - ‘update.zip’ saved [14072517/14072517]

--2019-02-12 16:50:33--  https://10.134.76.39:8181/web/download/update.zip

Connecting to 10.134.76.39:8181... connected.

HTTP request sent, awaiting response... 200 OK

Length: 14072517 (13M) [application/zip]

Saving to: ‘update.zip.1’

100%[===================================================================================================================================================================================================>] 14.072.517  41,1MB/s   in 0,3s

2019-02-12 16:47:49 (41,1 MB/s) - ‘update.zip.1’ saved [14072517/14072517]

--2019-02-12 16:47:51--  https://10.134.76.39:8181/web/download/update.zip

Connecting to 10.134.76.39:8181... connected.

HTTP request sent, awaiting response... 200 OK

Length: 14072517 (13M) [application/zip]

Saving to: ‘update.zip.2’

61% [======================================================================================================================>                                                                             ] 8.593.408   40,9MB/s             ^

The same problem occurs with the latest 1.4.53 version of lighttpd.
Any idea about a possible reason for this issue?

In attachment lighttpd configuration file.
Thanks.


Replies (17)

RE: File download issues - Added by gstrauss over 3 years ago

Thank you for testing with 1.4.53, too.

From your configuration file, it looks like lighttpd is serving update.zip as a static file. Is that correct? If you comment out mod_fastcgi and mod_proxy from your config, does the problem still persist? (I am trying to isolate where to begin.)

Are there any errors in the lighttpd error log?

If you have strace available, it might be useful to capture what lighttpd is doing for successful requests versus unsuccessful requests.
strace -o /tmp/strace.log -s 1024 lighttpd -D -f /etc/lighttpd/lighttpd.conf ...

What is the output of lighttpd -V ? Was lighttpd built with or without mmap support? (see the output of lighttpd -V) This will help direct us to which code path is serving the file.

Is the system under any load, or are these just serial requests that you are making?

What is the memory usage on the system during these tests? What is the disk space utilization?

This is one setting which can help reduce memory usage on embedded systems. 1024 is the lowest value. Default is 4096.
server.chunkqueue-chunk-sz = 1024

This is the default in lighttpd, but you might also explicitly set
ssl.read-ahead = "disable"
It is recommended that this be disabled for embedded systems with limited memory, and is disabled by default in lighttpd since lighttpd 1.4.46.

You mentioned that this occurs more frequently with openssl. What version of openssl is installed on the system?

If this is reproducible without openssl, you mentioned that it occurs less, but how frequently is that? (With your serial wget, is it reproducible approx once a minute, once an hour, once a day?)

RE: File download issues - Added by m.magliani@gmail.com over 3 years ago

Hi, thanks for the support.

Correct, lighttpd is serving update.zip as a static file.

Excluding both mod_fastcgi and mod_proxy from the config file, using the https I'm not able to dowload the file any more.
Using the http with a script launching 50 serialized wget downloads , there aren't problems.

No errors inside serror.log file.

We tried the approach to launch lighttpd with strace logging, as you suggested.
Unfortunately the issue doesn't occur in this test: the system is heavily busy while writing the log (download rate is relevantly lower too) and it seems that this conditions lead to successfull downloads.
A similar effect had occurred some days ago in other tests: specifically, we decreased MTU size in ethernet interface (from 1500 to 1360) and the issue disappeared.
However this decrease in MTU size was just for testing/debugging purposes and it can't be a permanent setting in our device.

This is the output of lighttpd -V :

ighttpd/1.4.49 (ssl) - a light and fast webserver

Event Handlers:

    + select (generic)
    + poll (Unix)
    + epoll (Linux)
    - /dev/poll (Solaris)
    - eventports (Solaris)
    - kqueue (FreeBSD)
    - libev (generic)

Network handler:

    + linux-sendfile
    - freebsd-sendfile
    - darwin-sendfile
    - solaris-sendfilev
    + writev
    + write
    - mmap support

Features:

    + IPv6 support
    + zlib support
    - bzip2 support
    - crypt support
    + SSL support
    + PCRE support
    - MySQL support
    - PgSQL support
    - DBI support
    - Kerberos support
    - LDAP support
    - memcached support
    - FAM support
    - LUA support
    - xml support
    - SQLite support
    - GDBM support

The system is not under heavy cpu load but we just repeatedly do the wgets/dowload.

About disk size: we have several GigaBytes free in our disk and then its' not a problem related to disk space shortage.
About memory usage: during the wget sequence (some successful and the last one failed) we inspecting the output of "dumpsys meminfo |tail -n 6" and we do not see relevant changes.
The typical output is this :

Total RAM: 2,021,648K (status normal)
 Free RAM:   781,464K (    6,704K cached pss +   722,664K cached kernel +    52,096K free)
 Used RAM:   432,430K (   66,498K used pss +   365,932K kernel)
 Lost RAM:   796,566K
     ZRAM:    14,200K physical used for    44,524K in swap (  520,908K total swap)
   Tuning: 192 (large 512), oom   322,560K, restore limit   107,520K (high-end-gfx)

The issue still remains also setting server.chunkqueue-chunk-sz = 1024 and ssl.read-ahead = "disable".

RE: File download issues - Added by m.magliani@gmail.com over 3 years ago

Sorry, I forgot some info:

openssl version: 1.0.2j

The issue occurs after few wget downloads (a file of about 20MB) in https (sometimes even with the 1st download).
With http, the issue rarely occurs, maybe about once every some hundreds iterations.

RE: File download issues - Added by gstrauss over 3 years ago

Given the output of lighttpd -V, we know that lighttpd is going to default to epoll. Please try reproducing the problem with:
server.event-handler = "poll"

Without strace, or tcpdump, or other trace, it is difficult to ascertain why the response is truncated. Is a TCP RST occurring? Is a socket event being lost? Does lighttpd think it sent the whole file to the kernel, but data gets lost after that? In your mod_accesslog, are you looking to see how much data lighttpd thinks it sent? Does it match up with the amount of data received by wget? On what system are you performing the wget? You might be able to look at the strace/tcpdump traffic there and see if there is anything useful.

Unlikely related, but you might also try
server.network-backend = "linux-sendfile"

For some extra trace in the lighttpd error log, you might try
debug.log-state-handling = "enable"
and post a (compressed) log of the results.

RE: File download issues - Added by m.magliani@gmail.com over 3 years ago

Hi, no relevant changes using server.event-handler = "poll".

On the client side, using Wireshark, we have captured two download sessions, the first one ok and the second one failed.

First download (OK, see download-ok.pcapng) :

--2019-02-18 15:43:17--  https://10.134.77.76/web/download/file.zip
Connecting to 10.134.77.76:443... connected.
  Self-signed certificate encountered.
HTTP request sent, awaiting response... 200 OK
Length: 18310952 (17M) [application/octet-stream]
Saving to: ‘file.zip’

file.zip        100%[=========================================================================>]  17,46M  51,9MB/s    in 0,3s

Second download (KO see download-ko.pcapng, strange messages at package 1679) :

--2019-02-18 15:43:20--  https://10.134.77.76/web/download/file.zip
Connecting to 10.134.77.76:443... connected.
  Self-signed certificate encountered.
HTTP request sent, awaiting response... 200 OK
Length: 18310952 (17M) [application/octet-stream]
Saving to: ‘file.zip’

file.zip         41%[=============================>                                            ]   7,26M  36,3MB/s               ^

In addiction we attach the strace strace-wget-client-side.log, output of a failed wget (client side).

On the server side we collected serror.log (with debug.log-state-handling = "enable" and server.network-backend = "linux-sendfile") for a failed wget.
We notice a sudden interruption of te sequence of write state.
About saccess.log the file represents a sequence of wget (21Mb) where the last one has failed and in that case the file size is wrong.
By the way the downloaded portion of the file.zip in this failed wget, is different from the last size present in the saccess.log.

PS
The log files are in attachment inside log-files.zip.
Wireshark capture files are too big to attach here, so I provide a Dropbox link to download them:
https://www.dropbox.com/s/7hzmsiw5fknl9o3/wireshark-captures.zip?dl=0

RE: File download issues - Added by gstrauss over 3 years ago

First thing that jumps out at me from the strace is of wget, not of lighttpd.

I'll try to take a closer look at the pcaps later.

RE: File download issues - Added by gstrauss over 3 years ago

FYI: I posted a patch to https://redmine.lighttpd.net/boards/2/topics/8491 in which a different TLS-related issue was raised.

RE: File download issues - Added by gstrauss over 3 years ago

I must admit that I am not the sharpest with pcaps, but it looks like the server is attempting to retransmit TCP packets at the end of download-ko.pcapng awaiting ack, after getting some selective acks, which do not occur in the successful download-ok.pcapng.

Would you mind trying to get an strace on the lighttpd process and include timing info (strace -ttt ...) ?

RE: File download issues - Added by m.magliani@gmail.com over 3 years ago

Hi,
we can't reproduce the issue when we use strace (following your hint to include timing info) server side.
The use of strace implies a sort of slowness in the download process that makes the issue disappear.
We tried to remove "-s 1024" option in order to write a reduced amount of data to file, then we limited the syscalls logged with "-e trace=..." options but no way:
the download with strace succeeds every time.
By the way, to avoid slowness, we made an attempt to log to standard output instead of a file but the download succeeds the same.
Then unfortunately we can't provide a strace log of lighttpd (server side) of a failed download.
We point out that the issue arises when using a wget client side but also:
- using a web browser directly targeting the URL;
- using cURL commands (see output below when 3rd cURL fails)

user@userpc ~ $ for i in $(seq 1 50); do curl -k https://10.134.76.23/web/download/file.zip --output file.zip; sleep 2; done
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 20.8M  100 20.8M    0     0  51.3M      0 --:--:-- --:--:-- --:--:-- 51.2M
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 20.8M  100 20.8M    0     0  45.0M      0 --:--:-- --:--:-- --:--:-- 45.0M
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 45 20.8M   45 9680k    0     0   274k      0  0:01:17  0:00:35  0:00:42     0

RE: File download issues - Added by gstrauss over 3 years ago

Unfortunately, I don't have an ARM64 system on which to test.

I have recently been working on some optimizations to the event framework in lighttpd, and also pushed a fix for https://redmine.lighttpd.net/boards/2/topics/8491 to an experimental branch of mine. If you could test git.lighttpd.net unstable branch personal/gstrauss/master, I would appreciate it. See DevelGit.

You reported that the problem seems to be hidden when reducing the MTU or when using strace. Did you try sending strace to an output file in /dev/shm? strace -o /dev/shm/strace.log -s 128 -ttt ... as the in-memory filesystem might be sufficiently fast.

Can you reproduce this when the client is on localhost with the server, even if it competes for the same CPU resources?

Have you seen this behavior on any systems other than ARM64? (I was unable to reproduce this in x86_64)

RE: File download issues - Added by m.magliani@gmail.com over 3 years ago

Hi, some inline answers/considerations about your last message.

I have recently been working on some optimizations to the event framework in lighttpd, and also pushed a fix for https://redmine.lighttpd.net/boards/2/topics/8491 to an experimental branch of mine. If you could test git.lighttpd.net unstable branch personal/gstrauss/master, I would appreciate it. See DevelGit.

We cloned the git repository, we cross-compiled the source tree (lighttpd/1.4.54-devel-lighttpd-1.4.53-41-g92321450) and we pushed to target the new executable and the new modules.
Unfortunately nothing changes, the issue arises after few wget iterations.

Did you try sending strace to an output file in /dev/shm? strace -o /dev/shm/strace.log -s 128 -ttt ... as the in-memory filesystem might be sufficiently fast.

We made this attempt; we tried to write the log in a tmpfs filesystem with the hope not to overload the system with the filesystem flushes to disk.
Unfortunately the result doesn't change, the download rate is lower and the issue can't be reproduceable.

Can you reproduce this when the client is on localhost with the server, even if it competes for the same CPU resources?

No, we can't reproduce this when asking a file served through a local URL.
We tried several cURL commands but none of them failed:

$ for i in $(seq 1 50); do curl -k https://localhost/web/download/file.zip --output file${i}.zip; sleep 2; done

Every command succeded.

Have you seen this behavior on any systems other than ARM64? (I was unable to reproduce this in x86_64)

No, we haven't.
We dealt with another armv7a machine (by the way, same version of openssl compiled against lighttpd) with a less powerful CPU and there we didn't experience this kind of problem.

Some other info:
the issue doesn't arise when the file involved in the download is smaller than more or less 750KB;
another thing, we reduced the speed of the eth0 physical interface: we passed from 1000FULL to 100FULL

# cat /sys/class/net/eth0/speed
1000

then, after our system hacking:
# cat /sys/class/net/eth0/speed
100

In this way the issue can't be seen.

RE: File download issues - Added by gstrauss over 3 years ago

Thank you for continuing to test.

Last week, I ordered a Rock64 and it will be on a boat (or plane) soon. In the meantime, I have an idea that I plan to code up this weekend for another attempt.

RE: SOLVED: File download issues - Added by m.magliani@gmail.com over 3 years ago

Hi,
finally we solved the issue.
Lighttpd was not involved and it didn't cause the freeze of downloads.
It was a TCP problem occurring in a specific network with errors and we fixed it.
Then, now we are going on using lighttpd-1.4.49, ARM cross-compiled, with success.

Thank you for your attention and support.

RE: [UE] File download issues - Added by mark99 29 days ago

hi, i faced with a similar problem with lighttpd/1.4.63 on x86 server (ubuntu 22.04)

# lighttpd -V
lighttpd/1.4.63 (ssl) - a light and fast webserver

Event Handlers:

        - select (generic)
        + poll (Unix)
        + epoll (Linux)
        - /dev/poll (Solaris)
        - eventports (Solaris)
        - kqueue (FreeBSD)
        - libev (generic)

Network handler:

        + linux-sendfile
        - freebsd-sendfile
        - darwin-sendfile
        - solaris-sendfilev
        + writev
        + write
        - mmap support

Features:

        + IPv6 support
        + zlib support
        + zstd support
        - bzip2 support
        + brotli support
        + crypt support
        + OpenSSL support
        + mbedTLS support
        + NSS crypto support
        - GnuTLS support
        + WolfSSL support
        + Nettle support
        + PCRE support
        + MySQL support
        + PgSQL support
        + DBI support
        + Kerberos support
        + LDAP support
        + PAM support
        + memcached support
        - FAM support
        + LUA support
        + xml support
        + SQLite support
        + GDBM support
        + Y2038 support

# cat /etc/lighttpd/lighttpd.conf
server.modules = (
        "mod_openssl",
        "mod_auth",
#       "mod_evhost",
#       "mod_scgi",
        "mod_accesslog",
        "mod_proxy",
#       "mod_access",
        "mod_alias",
        "mod_deflate",
        "mod_redirect",
        "mod_rewrite",
        "mod_authn_file",
        "mod_webdav",
        "mod_extforward",
        "mod_cgi",
)

server.http-parseopts = (
  "header-strict"           => "enable",
  "host-strict"             => "enable",
  "host-normalize"          => "enable",
  "url-normalize-unreserved"=> "enable",
  "url-normalize-required"  => "enable",
  "url-ctrls-reject"        => "enable",
  "url-path-2f-decode"      => "enable",
  "url-path-dotseg-remove"  => "enable",
  "url-query-20-plus"       => "enable",
)

include "/etc/lighttpd/mimetype.assign.conf" 

dir-listing.activate = "disable" 
dir-listing.encoding = "utf-8" 

server.feature-flags = ( "server.h2proto" => "enable", "server.h2c" => "enable" )
server.stream-response-body = 1
server.stream-request-body = 1
server.chunkqueue-chunk-sz = 32768

server.document-root        = "/var/www/html" 
server.upload-dirs          = ( "/dev/shm", "/var/tmp" )
server.errorlog             = "/var/log/lighttpd/error.log" 
server.pid-file             = "/var/run/lighttpd.pid" 
server.username             = "www-data" 
server.groupname            = "www-data" 
server.port                 = 80

server.follow-symlink = "enable" 

index-file.names            = ( "index.php", "index.html", "index.lighttpd.html" )

static-file.exclude-extensions = ( ".php", ".py" )

deflate.cache-dir           = "/var/cache/lighttpd/compress/" 
deflate.mimetypes           = ( "application/javascript", "text/css", "text/html", "text/plain" )

include "conf-enabled/*.conf" 
include "sites/*.conf" 
# cat /etc/lighttpd/conf-enabled/15-fastcgi-php.conf
fastcgi.server += ( ".php" =>
        ((
                "socket" => "/var/run/php/php8.1-fpm.sock",
                "allow-x-send-file" => "enable",
                "broken-scriptfilename" => "enable" 
        ))
)

I wrote this PHP code to use sendfile:

$file_mime_types = finfo_open(FILEINFO_MIME_TYPE);
header('Content-Type: ' . finfo_file($file_mime_types, $file->local_path));
finfo_close($file_mime_types);

header('Content-Disposition: attachment; filename="' . $file->name . '"');

header('Expires: 0');
header('Cache-Control: no-store');
header('Pragma: public');

header('Content-Length: ' . $file->size);

Header( "X-LIGHTTPD-send-file: " . $file->local_path);

As a result, periodically downloading files hung in random positions (download speed dropped to 0 Bps)

$ for i in $(seq 1 50); do curl -k "https://example.com/dl.php" --output file${i}.iso; sleep 2; done
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  8 4462M    8  366M    0     0  3580k      0  0:21:16  0:01:44  0:19:32     0^C

I took the advice and tried to increase server.chunkqueue-chunk-sz to 32768, but it didn't help.

My solution: disable server.h2proto and server.h2c
server.feature-flags = ( "server.h2proto" => "disable", "server.h2c" => "disable" )

I do not know why, but in my case it fixed the problem, I tried downloading a 4.5 GB file through curl 15 times and no freezes were detected

RE: [UE] File download issues - Added by gstrauss 29 days ago

mark99 your issue sounds like an issue in curl https://github.com/curl/curl/pull/8965 which is fixed in curl 7.84.
lighttpd 1.4.65 contains a workaround for the potential degenerative behavior in earlier versions of curl.

RE: [UE] File download issues - Added by mark99 25 days ago

I looked at the problem on github and it has to do with upload, my problem manifested itself when downloading from a web server. moreover, it also manifested itself in ordinary browsers (chrome) - the download speed suddenly drops to 0 and nothing else happens

RE: [UE] File download issues - Added by gstrauss 25 days ago

I looked at the problem on github and it has to do with upload, my problem manifested itself when downloading from a web server. moreover, it also manifested itself in ordinary browsers (chrome) - the download speed suddenly drops to 0 and nothing else happens

You seem to be posting without testing and collecting new evidence. Please test with lighttpd 1.4.65 and if you still see an issue, then capture an strace of lighttpd during the issue and start a new forum post. Do not continue in this 3-year-old thread, as this thread was about lighttpd 1.4.49, before lighttpd added HTTP/2 support (lighttpd 1.4.56). The conclusion of this thread was that there was an error on the user's network which had nothing to do with lighttpd, so I do not know where you got the idea that this was an appropriate place to post your clearly distinct issue.

    (1-17/17)