Project

General

Profile

Bug #2655

Serving a file that is getting updated can cause an empty response or incorrect content-length error

Added by tfischer over 2 years ago. Updated over 1 year ago.

Status:
Fixed
Priority:
High
Assignee:
-
Category:
core
Target version:
Start date:
2015-07-30
Due date:
% Done:

0%

Estimated time:
Missing in 1.5.x:

Description

Imagine a webcam at a ski area that is updating a still image once a minute. If the image is getting updated as it is getting fetched, the client fetching the image can get either an empty response or incorrect content-length depending on the lighttpd connection state machine. In looking at code, stat_cache_get_entry() does a stat() and open()/close() on the file, and later etwork_write_chunkqueue_linuxsendfile() again open()s the file to get the content. If the image update occurs between these two activities, then the client will not get the image file properly.

One solution is to open the file once and use the file description for fstat() and reading the data, then close the file. Any other ideas?

Here is the configuration and code to demonstrate the problem. I installed lighttpd in /tmp/lighttpd for my testing. You can use any two JPG files. Run alternate-images.sh to cause the image file being fetched to be constantly changed. In your browser, fetch refetch-image.html, which causes the image to be fetched again once the current screen update is done. You need to put refetch-image.html in the htdocs directory and have alternate-images.sh copy the image file to that directory as well. I tested with 1.4.36.

lighttpd.conf

server.event-handler = "linux-sysepoll" 
server.network-backend = "linux-sendfile" 
server.max-fds = 2048
server.stat-cache-engine = "simple" 
server.max-connections = 1024
index-file.names += (
  "index.xhtml", "index.html", "index.htm", "default.htm", "index.php" 
)
mimetype.assign             = (
  ".html"         =>      "text/html",
  ".jpg"          =>      "image/jpeg",
)
server.upload-dirs = ( "/tmp/lighttpd/uploads" )

alternate-images.sh

#!/bin/sh
HTDOCS=/tmp/lighttpd/www/htdocs

while true ; do
   cp img1.jpg $HTDOCS/img.jpg
   sleep 0.01
   cp img2.jpg $HTDOCS/img.jpg
done

refetch-image.html

<!DOCTYPE html>
<html>

<head>
    <meta http-equiv="content-type" content="text/html; charset=utf-8" />
    <meta http-equiv="Cache-control" content="no-cache">
    <title>Demonstrate lighttpd content-length issue when image being fetched is being updated</title>
</head>

<body>
    <div id="video_image_div">
        <img id="video_image" src="img.jpg" onload="delayed_refresh_video()" />
    </div>
    <script>
        var preview_count = 0;

        function refresh_video() {
            preview_count++;
            document.getElementById("video_image").src = "img.jpg?img=" + preview_count
        }

        function delayed_refresh_video() {
            setTimeout(refresh_video, 30);
        }

        console.log("Starting repeated fetch of img.jpg file")
        refresh_video();
    </script>

</body>

</html>
lighttpd-use-single-fd-per-connection.patch (6.42 KB) lighttpd-use-single-fd-per-connection.patch Switch from stat() to fstat() using a cached file descriptor. tfischer, 2015-08-03 22:47
lighttpd-issue-2655-testing-notes.txt (3.75 KB) lighttpd-issue-2655-testing-notes.txt tfischer, 2016-03-30 15:29

Associated revisions

Revision a65c57a5 (diff)
Added by gstrauss over 1 year ago

[core] open fd when appending file to cq (fixes #2655)

http_chunk_append_file() opens fd when appending file to chunkqueue.
Defers calculation of content length until response is finished.

This reduces race conditions pertaining to stat() and then (later)
open(), when the result of the stat() was used for Content-Length
or to generate chunked headers.

Note: this does not change how lighttpd handles files that are modified
in-place by another process after having been opened by lighttpd --
don't do that. This does improve handling of files that are
frequently modified via a temporary file and then atomically renamed
into place.

mod_fastcgi has been modified to use http_chunk_append_file_range() with
X-Sendfile2 and will open the target file multiple times if there are
multiple ranges.

Note: (future todo) not implemented for chunk.[ch] interfaces used by
range requests in mod_staticfile or by mod_ssi. Those uses could lead
to too many open fds. For mod_staticfile, limits should be put in place
for max number of ranges accepted by mod_staticfile. For mod_ssi,
limits would need to be placed on the maximum number of includes, and
the primary SSI file split across lots of SSI directives should either
copy the pieces or perhaps chunk.h could be extended to allow for an
open fd to be shared across multiple chunks. Doing either of these
would improve the performance of SSI since they would replace many file
opens on the pieces of the SSI file around the SSI directives.

x-ref:
"Serving a file that is getting updated can cause an empty response or incorrect content-length error"
https://redmine.lighttpd.net/issues/2655

github:
Closes #49

History

#1

Updated by darix over 2 years ago

write to tempfile and rename the file to the final destination once it is fully written.

#2

Updated by stbuehler over 2 years ago

  • Status changed from New to Invalid

We will not read complete files into memory before responding. Use "atomic updating" as darix explained.

#3

Updated by tfischer over 2 years ago

darix wrote:

write to tempfile and rename the file to the final destination once it is fully written.

Thanks for the suggestion. I changed alternate-images.sh so that it now gets the image file fully written to a file and then renames the file.

I am still getting ERR_CONTENT_LENGTH_MISMATCH in the browser when I fetch refetch-image.html. When the rename occurs after the stat() and before the open() the length is for the old version of the file and the open() sends the contents from the new version of the file.

New version of alternate-images.sh based on darix feedback:

#!/bin/sh
while true ; do
   cp img1.jpg /tmp/lighttpd/www/htdocs
   mv /tmp/lighttpd/www/htdocs/img1.jpg /tmp/lighttpd/www/htdocs/img.jpg
   sleep 0.01
   cp img2.jpg /tmp/lighttpd/www/htdocs
   mv /tmp/lighttpd/www/htdocs/img2.jpg /tmp/lighttpd/www/htdocs/img.jpg
done

#4

Updated by tfischer over 2 years ago

stbuehler wrote:

We will not read complete files into memory before responding. Use "atomic updating" as darix explained.

I am not sure why you would need to read the complete file into memory. If you have an open file descriptor, and the file is deleted (either via an unlink or rename syscall), the process with the open file descriptor can continue to read the file. Once the process with the open file descriptor closes the file, then Linux does the actual deletion.

reference: http://stackoverflow.com/questions/2028874/what-happens-to-an-open-file-handler-on-linux-if-the-pointed-file-gets-moved-de

#5

Updated by stbuehler over 2 years ago

tfischer wrote:

stbuehler wrote:

We will not read complete files into memory before responding. Use "atomic updating" as darix explained.

I am not sure why you would need to read the complete file into memory. If you have an open file descriptor, and the file is deleted (either via an unlink or rename syscall), the process with the open file descriptor can continue to read the file. Once the process with the open file descriptor closes the file, then Linux does the actual deletion.

Which is exactly what the "atomic update" described by darix uses; the move-in-place will delete the old file, and replace it with the new file. lighty still has an fd for the old file, and all is well.

But if you actually modify the existing file, the lighty will read corrupt data, not the whole file, or tries to read to much if the file got smaller.

#6

Updated by stbuehler over 2 years ago

  • Status changed from Invalid to Reopened
  • Target version set to 1.4.37

tfischer wrote:

I am still getting ERR_CONTENT_LENGTH_MISMATCH in the browser when I fetch refetch-image.html. When the rename occurs after the stat() and before the open() the length is for the old version of the file and the open() sends the contents from the new version of the file.

Hm right, this race condition seems still to be present.
I've been working on this and related stuff in http://git.lighttpd.net/lighttpd/lighttpd-1.x.git/log/?h=lighttpd-1.4.x-stbuehler-api-cleanup

#7

Updated by tfischer over 2 years ago

stbuehler wrote:

tfischer wrote:

I am still getting ERR_CONTENT_LENGTH_MISMATCH in the browser when I fetch refetch-image.html. When the rename occurs after the stat() and before the open() the length is for the old version of the file and the open() sends the contents from the new version of the file.

Hm right, this race condition seems still to be present.
I've been working on this and related stuff in http://git.lighttpd.net/lighttpd/lighttpd-1.x.git/log/?h=lighttpd-1.4.x-stbuehler-api-cleanup

I looked over the code and figured moving the open file handle from the chunk structure to the connection structure might be a possible solution. I made the change and it works for the non-error paths. The change violates a stat() before open() comment; I am not sure if the comment is still valid. My patch is a few hours work and likely not of much production value. I include either for your reference for your humor.

#8

Updated by stbuehler over 2 years ago

  • Priority changed from Normal to High
#9

Updated by stbuehler over 2 years ago

  • Target version changed from 1.4.37 to 1.4.38
#10

Updated by stbuehler about 2 years ago

  • Target version changed from 1.4.38 to 1.4.39
#11

Updated by stbuehler almost 2 years ago

  • Target version changed from 1.4.39 to 1.4.40
#12

Updated by gstrauss over 1 year ago

  • Status changed from Reopened to Patch Pending
#13

Updated by tfischer over 1 year ago

I tested patch (bug-2655-http_chunk-open-fstat) on Linux running both chrome and firefox, mac running chrome and safari, win running edge (IE) and chrome, and android phone running chrome. All browsers (total 7 instances) refetching images as fast as they could. Only the android phone showed an issue and that seemed to happen when the phone came out of sleep mode. I let the test run around an hour.

I first ran lighttpd based on the code version prior to bug-2655-http_chunk-open-fstat to verify I was running the test properly. With just one browser refetching the image, it failed within 5 to 7 seconds each of the 5 test runs I tried.

Based on my alternate-images.sh test case, I verified bug-2655-http_chunk-open-fstat fixed defect #2655.

I am attaching my build, configure, run, and test instructions if someone wonders exactly what I did.

Next I will test on the embedded ARM (high speed camera) device where I first came across the defect.

Glenn, thanks for the fix.

#14

Updated by gstrauss over 1 year ago

Thanks for the thorough testing verification! :)

#15

Updated by gstrauss over 1 year ago

  • Status changed from Patch Pending to Fixed

committed to master

Also available in: Atom