Bug #2655
closedServing a file that is getting updated can cause an empty response or incorrect content-length error
Description
Imagine a webcam at a ski area that is updating a still image once a minute. If the image is getting updated as it is getting fetched, the client fetching the image can get either an empty response or incorrect content-length depending on the lighttpd connection state machine. In looking at code, stat_cache_get_entry() does a stat() and open()/close() on the file, and later etwork_write_chunkqueue_linuxsendfile() again open()s the file to get the content. If the image update occurs between these two activities, then the client will not get the image file properly.
One solution is to open the file once and use the file description for fstat() and reading the data, then close the file. Any other ideas?
Here is the configuration and code to demonstrate the problem. I installed lighttpd in /tmp/lighttpd for my testing. You can use any two JPG files. Run alternate-images.sh to cause the image file being fetched to be constantly changed. In your browser, fetch refetch-image.html, which causes the image to be fetched again once the current screen update is done. You need to put refetch-image.html in the htdocs directory and have alternate-images.sh copy the image file to that directory as well. I tested with 1.4.36.
lighttpd.conf
server.event-handler = "linux-sysepoll" server.network-backend = "linux-sendfile" server.max-fds = 2048 server.stat-cache-engine = "simple" server.max-connections = 1024 index-file.names += ( "index.xhtml", "index.html", "index.htm", "default.htm", "index.php" ) mimetype.assign = ( ".html" => "text/html", ".jpg" => "image/jpeg", ) server.upload-dirs = ( "/tmp/lighttpd/uploads" )
alternate-images.sh
#!/bin/sh HTDOCS=/tmp/lighttpd/www/htdocs while true ; do cp img1.jpg $HTDOCS/img.jpg sleep 0.01 cp img2.jpg $HTDOCS/img.jpg done
refetch-image.html
<!DOCTYPE html> <html> <head> <meta http-equiv="content-type" content="text/html; charset=utf-8" /> <meta http-equiv="Cache-control" content="no-cache"> <title>Demonstrate lighttpd content-length issue when image being fetched is being updated</title> </head> <body> <div id="video_image_div"> <img id="video_image" src="img.jpg" onload="delayed_refresh_video()" /> </div> <script> var preview_count = 0; function refresh_video() { preview_count++; document.getElementById("video_image").src = "img.jpg?img=" + preview_count } function delayed_refresh_video() { setTimeout(refresh_video, 30); } console.log("Starting repeated fetch of img.jpg file") refresh_video(); </script> </body> </html>
Files
Updated by darix over 9 years ago
write to tempfile and rename the file to the final destination once it is fully written.
Updated by stbuehler over 9 years ago
- Status changed from New to Invalid
We will not read complete files into memory before responding. Use "atomic updating" as darix explained.
Updated by tfischer over 9 years ago
darix wrote:
write to tempfile and rename the file to the final destination once it is fully written.
Thanks for the suggestion. I changed alternate-images.sh so that it now gets the image file fully written to a file and then renames the file.
I am still getting ERR_CONTENT_LENGTH_MISMATCH in the browser when I fetch refetch-image.html. When the rename occurs after the stat() and before the open() the length is for the old version of the file and the open() sends the contents from the new version of the file.
New version of alternate-images.sh based on darix feedback:
#!/bin/sh while true ; do cp img1.jpg /tmp/lighttpd/www/htdocs mv /tmp/lighttpd/www/htdocs/img1.jpg /tmp/lighttpd/www/htdocs/img.jpg sleep 0.01 cp img2.jpg /tmp/lighttpd/www/htdocs mv /tmp/lighttpd/www/htdocs/img2.jpg /tmp/lighttpd/www/htdocs/img.jpg done
Updated by tfischer over 9 years ago
stbuehler wrote:
We will not read complete files into memory before responding. Use "atomic updating" as darix explained.
I am not sure why you would need to read the complete file into memory. If you have an open file descriptor, and the file is deleted (either via an unlink or rename syscall), the process with the open file descriptor can continue to read the file. Once the process with the open file descriptor closes the file, then Linux does the actual deletion.
Updated by stbuehler over 9 years ago
tfischer wrote:
stbuehler wrote:
We will not read complete files into memory before responding. Use "atomic updating" as darix explained.
I am not sure why you would need to read the complete file into memory. If you have an open file descriptor, and the file is deleted (either via an unlink or rename syscall), the process with the open file descriptor can continue to read the file. Once the process with the open file descriptor closes the file, then Linux does the actual deletion.
Which is exactly what the "atomic update" described by darix uses; the move-in-place will delete the old file, and replace it with the new file. lighty still has an fd for the old file, and all is well.
But if you actually modify the existing file, the lighty will read corrupt data, not the whole file, or tries to read to much if the file got smaller.
Updated by stbuehler over 9 years ago
- Status changed from Invalid to Reopened
- Target version set to 1.4.37
tfischer wrote:
I am still getting ERR_CONTENT_LENGTH_MISMATCH in the browser when I fetch refetch-image.html. When the rename occurs after the stat() and before the open() the length is for the old version of the file and the open() sends the contents from the new version of the file.
Hm right, this race condition seems still to be present.
I've been working on this and related stuff in http://git.lighttpd.net/lighttpd/lighttpd-1.x.git/log/?h=lighttpd-1.4.x-stbuehler-api-cleanup
Updated by tfischer over 9 years ago
stbuehler wrote:
tfischer wrote:
I am still getting ERR_CONTENT_LENGTH_MISMATCH in the browser when I fetch refetch-image.html. When the rename occurs after the stat() and before the open() the length is for the old version of the file and the open() sends the contents from the new version of the file.
Hm right, this race condition seems still to be present.
I've been working on this and related stuff in http://git.lighttpd.net/lighttpd/lighttpd-1.x.git/log/?h=lighttpd-1.4.x-stbuehler-api-cleanup
I looked over the code and figured moving the open file handle from the chunk structure to the connection structure might be a possible solution. I made the change and it works for the non-error paths. The change violates a stat() before open() comment; I am not sure if the comment is still valid. My patch is a few hours work and likely not of much production value. I include either for your reference for your humor.
Updated by stbuehler over 9 years ago
- Target version changed from 1.4.37 to 1.4.38
Updated by stbuehler about 9 years ago
- Target version changed from 1.4.38 to 1.4.39
Updated by stbuehler almost 9 years ago
- Target version changed from 1.4.39 to 1.4.40
Updated by gstrauss over 8 years ago
- Status changed from Reopened to Patch Pending
Submitted pull request https://github.com/lighttpd/lighttpd1.4/pull/49
Updated by tfischer over 8 years ago
I tested patch (bug-2655-http_chunk-open-fstat) on Linux running both chrome and firefox, mac running chrome and safari, win running edge (IE) and chrome, and android phone running chrome. All browsers (total 7 instances) refetching images as fast as they could. Only the android phone showed an issue and that seemed to happen when the phone came out of sleep mode. I let the test run around an hour.
I first ran lighttpd based on the code version prior to bug-2655-http_chunk-open-fstat to verify I was running the test properly. With just one browser refetching the image, it failed within 5 to 7 seconds each of the 5 test runs I tried.
Based on my alternate-images.sh test case, I verified bug-2655-http_chunk-open-fstat fixed defect #2655.
I am attaching my build, configure, run, and test instructions if someone wonders exactly what I did.
Next I will test on the embedded ARM (high speed camera) device where I first came across the defect.
Glenn, thanks for the fix.
Updated by gstrauss over 8 years ago
Thanks for the thorough testing verification! :)
Also available in: Atom