Project

General

Profile

Actions

Bug #3211

closed

mod_dirlisting includes readme / footer only sporadically

Added by laf0rge 11 months ago. Updated 11 months ago.

Status:
Fixed
Priority:
Normal
Category:
mod_dirlisting
Target version:
ASK QUESTIONS IN Forums:
No

Description

I noticed the following regresion after upgrading to Debian 12 (lighttpd-1.69) today:

When mod_dirlisting is configured to include a header and/or readme into the directory index (using the dir-listing.show-header or dir-listing.show-readme options), said header/readme files are not included reliably into the HTTP response.

Specifically, it looks as if the first request after a pause (or restart) gets the header/readme included, but any subsequent requests inside a certain period are rendered without the header/readme.

I tried different stat-cache implementations: "disable", "inotify" and "simple" - they all could reproduce the same behaviour.

When strace()ing lighttpd, one can clearly see that the README.txt file (I use dir-listing.show-readme = "README.txt") access pattern changes:

Working case:

newfstatat(AT_FDCWD, "/data/www/user_dir/HEADER.txt"...
openat(AT_FDCWD, "/data/www/user_dir/HEADER.txt"...

non-working case:

newfstatat(AT_FDCWD, "/data/www/user_dir/HEADER.txt"...

so somehow the file is not opened on the second request.

Some debugging and instrumentation later, it seems that the file is opened on first access, and then kept open by the stat_cache (even when "disable" is used, which probably is a separate bug).

However, the mod_dirlisting.c code read()s the file without rewinding back after reading it. This explains why on first read after open it succeeds, and subsequent reads then return no data as the read cursor is already at EOF.

I've checked the git master of today, and it seems there is still no rewind after the read even in master, so the problem should still exist.

A quick check of the commitlog suggest that the followign commit is likely causing the regression (in 1.4.60). Until that commit, the file was opened directly inside mod_dirlisting. But with this commit, the dependency on the stat_cache_get_entry_open() function was introduced.

commit f83ff671fcd348afde41ee50fea69149920004c0
Author: Glenn Strauss <gstrauss@gluelogic.com>
Date:   Sat Mar 20 07:57:35 2021 -0400

    [mod_dirlisting] limit buffer use for large dirs

Separate note: This commit log entry doesn't state anything about introducing support/dependency to the stat_cache, which is quite sneaky!


Files

0001-mod_dirindex-fix-unreliable-rendering-of-readme-head.patch (1.16 KB) 0001-mod_dirindex-fix-unreliable-rendering-of-readme-head.patch git patch against master. Tested on 1.69 but not on master! laf0rge, 2023-06-11 16:36
Actions #2

Updated by gstrauss 11 months ago

Thank you for the detailed bug report. Yes, you are correct. I may expand your patch a bit. Thank you for the starting suggestion.

I checked the rest of the code for calls to stat_cache_get_entry_open(), and other usage elsewhere appears fine, as reading from the chunkqueue uses pread() or lseek() to position the file pointer appropriately.

As a workaround, you might enable dir-listing.cache, which was added in lighttpd 1.4.60. The default dir-listing.cache parameter "max-age" => 15 should work fine with stat-cache implementations: "disable" and "simple", though the workaround might not be useful with stat-cache implementation "inotify", unless the directories in question are not modified frequently. YMMV.

Actions #3

Updated by gstrauss 11 months ago

  • Status changed from New to Patch Pending
  • Target version changed from 1.4.xx to 1.4.72

Performing the lseek() immediately prior to the read() seems to me like a better approach, and is what is done in src/mod_magnet_cache.c:script_cache_load_script()

--- a/src/mod_dirlisting.c
+++ b/src/mod_dirlisting.c
@@ -610,7 +610,8 @@ static void http_list_directory_include_file(request_st * const r, const handler
       stat_cache_get_entry_open(path, r->conf.follow_symlink);
     if (len)
         buffer_truncate(&r->physical.path, len);
-    if (NULL == sce || sce->fd < 0 || 0 == sce->st.st_size)
+    if (NULL == sce || sce->fd < 0 || 0 == sce->st.st_size
+        || -1 == lseek(sce->fd, 0, SEEK_SET))
         return;

     chunkqueue * const cq = &r->write_queue;

lighttpd is currently single-threaded. If lighttpd were to have multiple threads processing mod_dirlisting, then the read() could be replaced with lighttpd chunk_file_pread() to be thread-safe on systems with pread()

I'll run this through some tests tomorrow before committing. I may choose to use chunk_file_pread() anyway, as it may provide better performance on systems with pread() (by potentially avoiding the separate lseek())

Actions #4

Updated by gstrauss 11 months ago

--- a/src/mod_dirlisting.c
+++ b/src/mod_dirlisting.c
@@ -633,8 +633,10 @@ static void http_list_directory_include_file(request_st * const r, const handler
         buffer_clear(out);
         const int fd = sce->fd;
         ssize_t rd;
+        off_t off = 0;
         char buf[8192];
-        while ((rd = read(fd, buf, sizeof(buf))) > 0) {
+        while ((rd = chunk_file_pread(fd, buf, sizeof(buf), off)) > 0) {
+            off += rd;
             buffer_append_string_encoded(out, buf, (size_t)rd, ENCODING_MINIMAL_XML);
             if (out == tb) {
                 if (0 != chunkqueue_append_mem_to_tempfile(cq,
Actions #5

Updated by gstrauss 11 months ago

Some debugging and instrumentation later, it seems that the file is opened on first access, and then kept open by the stat_cache (even when "disable" is used, which probably is a separate bug).

Aside: That behavior is intentional and is therefore not a bug. Some stat cache interfaces open() the file and then fstat() to minimize ToC-ToU race conditions. The open fd and its stat info is cached together. If you are modifying the file in place (not recommended), rather than editing a copy and atomically replacing with rename(), then you probably want to use server.stat-cache-engine = "disable". If you are extending a file (such as a log file), then the cached size and associated ETag are/were accurate at the time the info was collected, and that will be what is returned in the response which uses the stat_cache entry until the stat_cache entry expires.

Separate note: This commit log entry doesn't state anything about introducing support/dependency to the stat_cache, which is quite sneaky!

As I am sure you have concluded, the bug was not intentional. The use of stat cache in mod_dirlisting also supports dir-listing.cache commit 378bc034, committed in Mar 2021 a bit more than a week after commit f83ff671 which you referenced above. I was working on those around the same time. I do agree that it is generally good practice to document important changes in the commit message. Sometimes such changes might be seen in a sequence of related commits within an given timeframe.

Actions #6

Updated by laf0rge 11 months ago

gstrauss wrote in #note-5:

Aside: That behavior is intentional and is therefore not a bug. Some stat cache interfaces open() the file and then fstat() to minimize ToC-ToU race conditions. The open fd and its stat info is cached together. If you are modifying the file in place, rather than editing a copy and atomically replacing with rename(), then you probably want to uses server.stat-cache-engine = "disable". If you are extending a file (such as a log file), then the cached size and associated ETag are/were accurate at the time the info was collected, and that will be what is returned in the response which uses the stat_cache entry until the stat_cache entry expires.

I'm not claiming that keeping the file open by the stat cache is a bug. I'm claiming that it is a bug that if I as the user set the stat_cache to "disable" in the config, I expect it to be disabled. But in fact it is still used, at least in the context of mod_dirlisting. From a user point of view (also after reading the docs), my expectation would be that the cache is disabled. Still, as we can sees by this bug, the cache is actually active with 1s expiration time. Please correct me if I'm wrong?

btw: Unfortunately I as reporter don't have the redmine permissions to edit/fix the subject of this issue, I'd appreciate if anyone with elevated privileges could do a s/mod_dirindex/mod_dirlisting/ on the subject, thanks.

Actions #7

Updated by laf0rge 11 months ago

gstrauss wrote in #note-3:

Performing the lseek() immediately prior to the read() seems to me like a better approach, and is what is done in src/mod_magnet_cache.c:script_cache_load_script()

I guess that's a matter of taste, certainly both will fix the bug.

I'll run this through some tests tomorrow before committing. I may choose to use chunk_file_pread() anyway, as it may provide better performance on systems with pread() (by potentially avoiding the separate lseek())

yes, it's always good to avoid another syscall and hence context-switch in the middle of HTTP request processing.

Actions #8

Updated by gstrauss 11 months ago

  • Subject changed from mod_dirindex includes readme / footer only sporadically to mod_dirlisting includes readme / footer only sporadically
  • Description updated (diff)

I'm claiming that it is a bug that if I as the user set the stat_cache to "disable" in the config, I expect it to be disabled. But in fact it is still used, at least in the context of mod_dirlisting. From a user point of view (also after reading the docs), my expectation would be that the cache is disabled. Still, as we can sees by this bug, the cache is actually active with 1s expiration time. Please correct me if I'm wrong?

I think you may be confusing lighttpd internals with the behavior that the user sees.
(strace() is not info that a web client sees, though an admin can observe it.)

A stat() is always performed in stat_cache_get_entry() when the stat cache is disabled, whereas other stat cache engine choices will return early if the entry has not expired in the cache. After a stat(), if the stat info does not compare equal with stat_cache_stat_eq() to the existing entry, then the stat cache entry is not re-used. If it does compare equal, then the entry may be reused internally by lighttpd. On systems where high-precision timestamps are available in the stat info, lighttpd uses those high precision timestamps in the comparison.

As I mentioned in my prior comment, this bit of lighttpd internals returns a stat_cache object with file size and date information, and depending on the stat cache interface, may also include an open fd to the file from which fstat() info was obtained. This object is used by lighttpd internally, e.g. for a consistent HTTP response including Content-Length, Last-Modified, ETag, and the file content that matches that metadata (or at least did at the time the file was open() and fstat()).


Performing the lseek() immediately prior to the read() seems to me like a better approach, and is what is done in src/mod_magnet_cache.c:script_cache_load_script()

I guess that's a matter of taste, certainly both will fix the bug.

Sorry, no. It is not a matter of taste. Your proposed solution assumes that nothing else in lighttpd which uses the stat_cache might read and change the file position before the next use in mod_dirlisting. That is not necessarily true. My initial proposed solution, valid in single-threaded lighttpd, positions the file pointer immediately before use by read() and is therefore a better approach.


Again, thank you for troubleshooting and reporting the bug in mod_dirlisting which reused the file descriptor without first repositioning the file pointer. That is a bug in lighttpd and will be fixed.

The other statements and opinions you have shared are not as convincing.

Actions #9

Updated by gstrauss 11 months ago

The default behavior in lighttpd mod_dirlisting is:
dir-listing.encode-header = "enable"
dir-listing.encode-readme = "enable"

Another workaround: The bug can be avoided if those are set to "disable" and if the header/readme files are valid HTML which do not need to be encoded before being included in the mod_dirlisting response.

Actions #10

Updated by laf0rge 11 months ago

gstrauss wrote in #note-8:

I'm claiming that it is a bug that if I as the user set the stat_cache to "disable" in the config, I expect it to be disabled. But in fact it is still used, at least in the context of mod_dirlisting. From a user point of view (also after reading the docs), my expectation would be that the cache is disabled. Still, as we can sees by this bug, the cache is actually active with 1s expiration time. Please correct me if I'm wrong?

I think you may be confusing lighttpd internals with the behavior that the user sees.
(strace() is not info that a web client sees, though an admin can observe it.)

To clarify: When I speak of the "user", I'm not speaking about a remote HTTP client, but me as the person who installs and operates the software lighttpd, utilizing the available documentation to do so.

When observing the bug, it immediately "smelled" like there is some kind of caching going on, as a sufficiently long wait between two HTTP requests made the readme/footer render again. So the next step was to disable any kind of caching described in the documentation. When the behavior didn't change, I was very puzzled and this sent me off to look for the bug in other parts of the code which are not the cache, because the cache was presumably disabled ;)

So you are saying that the stat result is not cached when the stat-cache is set to disable, ok. My implicit assumption was that nothing would be cached at all. But in fact, a cache still appears to be used, and it caches the open file descriptor.

After a stat(), if the stat info does not compare equal with stat_cache_stat_eq() to the existing entry, then the stat cache entry is not re-used.

I hear you. But IMHO, the mere fact that there is (stat) cache entry which could be re-used means that the cache is not disabled.

Out of curiosity: Is there any configuration I could have don to avoid any caching whatsoever? Basically a dumb "start from scratch with every HTTP request, not keeping any state"?

This object is used by lighttpd internally, e.g. for a consistent HTTP response including Content-Length, Last-Modified, ETag, and the file content that matches that metadata (or at least did at the time the file was open() and fstat()).

I understand that this needs to be consistent during one given request. But caching something (the FD here) beyond the duration of a request is still "caching". My mistake was to assume that disabling the stat cache would make it go away completely (no stat cache entries exist, at the very least not beyond a single request) and of course also any information associated with the cache entry, such as the fd.

Anyway, I don't want to eat up more of your valuable time. Thanks for lighttpd, and thanks for your super fast response to the reported issue!

Actions #11

Updated by gstrauss 11 months ago

There was a bug in lighttpd. Thank you for for finding it, for reporting it, and for providing a detailed analysis. That effort is very much appreciated.


Out of curiosity: Is there any configuration I could have don to avoid any caching whatsoever? Basically a dumb "start from scratch with every HTTP request, not keeping any state"?

Not recommended: You can run lighttpd from xinetd and disable keep-alives so that each request is handled in a separate lighttpd instance which then exits. Not recommended for most usage, though it will work, even if much less efficiently than running lighttpd as a daemon.

Your subsequent "I would have identified the bug more quickly if ..." has led you to multiple wrong assumptions which for some reason you seem to keep trying to argue. Please recognize this behavior of yours.

lighttpd internals are internals. Specifics of lighttpd internals are not within the realm of "user" questions. They are "developer" questions.

I have explained multiple times that your "developer" assumptions about lighttpd internals were incorrect, and I took time to explain why. I have provided details to back up my statements that the behavior is intentional. I will not have any further response to this line of questioning.


There was a bug in lighttpd. Thank you for finding it, for reporting it, and for providing a detailed analysis. That effort is very much appreciated.

Some workarounds are noted in comments above, and might be tried until a patch finds its way into Debian.

Actions #12

Updated by gstrauss 11 months ago

  • Status changed from Patch Pending to Fixed
Actions

Also available in: Atom