Project

General

Profile

Actions

Bug #3201

closed

include_shell not working on CentOS 7 (or other platforms using glibc <= 2.23)

Added by fstelzer 10 months ago. Updated 10 months ago.

Status:
Fixed
Priority:
Normal
Category:
core
Target version:
ASK QUESTIONS IN Forums:
No

Description

The recent 1.4.70 release seems to break "include_shell" configs for CentOS 7. I could not replicate the issue on Rocky 8 though and have no idea what the actual underlying problem is or why it affects one and not the other.
But a git bisect traced the problem to commit:
701eb0a0cadd278466d179ae1d74d089ffaf25fd - [core] modify use of posix_spawnattr_setsigdefault
and indeed reverting just this commit fixes it for me.
I have no idea though what the intention of the diff was (besides a mention of reducing syscalls) and probably dont have nearly enough knowledge process / signal & lighttpd internals to suggest a fix...

My test config was just:
include_shell "echo \"server.port = 8080\""

and a syntax check is enough to trigger it:
./src/lighttpd -f test.conf -t


Related issues 1 (0 open1 closed)

Has duplicate Bug #3204: CentOS 7 issue with lighttpd 1.4.70 (or other platforms using glibc <= 2.23)DuplicateActions
Actions #1

Updated by gstrauss 10 months ago

My guess is that older versions of Linux fail for specific signals included in sigfillset().

Are you able to test with this patch?

--- a/src/fdevent.c
+++ b/src/fdevent.c
@@ -516,7 +516,7 @@ pid_t fdevent_fork_execve(const char *name, char *argv[], char *envp[], int fdin
        #endif
         && 0 == (rc = sigemptyset(&sigs))
         && 0 == (rc = posix_spawnattr_setsigmask(&attr, &sigs))
-      #ifdef __linux__
+      #if 0
         /* linux appears to walk all signals and to query and preserve some
          * sigaction flags even if setting to SIG_DFL, though if specified
          * in posix_spawnattr_setsigdefault(), resets to SIG_DFL without query.

Actions #2

Updated by fstelzer 10 months ago

gstrauss wrote in #note-1:

My guess is that older versions of Linux fail for specific signals included in sigfillset().

Are you able to test with this patch?
[...]

yes, with this patch (which is effectively reverting the mentioned commit right?) the include_shell works

Actions #3

Updated by gstrauss 10 months ago

Thanks for testing.

The whole section using posix_spawn is new, though it is code that I wrote back in 2016 and did not release (due to CGI chdir() requirement). The special code for Linux (now commented out in the patch above) is marginally more efficient on linux (but I may consider choosing to use the other block for compatibility). For some reason, Linux or glibc appears to reset all the signals (all 64), whereas on at least some other platforms which I looked at, only the signals I listed in the subsequent block are reset.

At the moment, I am not planing an immediate release of lighttpd 1.4.71, unless other issue arise. Is this something that can be patched for CentOS? (How old is CentOS 7?) A quick internet search turned up that CentOS 8 was EOL Dec 31, 2021, and I presume CentOS 7 is also EOL.

Actions #4

Updated by fstelzer 10 months ago

no worries. let me know if i can do/test something else.

Centos7 & RHEL7 still has maintenance support till June 2024 and i think is still quite widespread (rhel in general is popular for their 10+ years support).

Centos 8 was a special case for which they changed their release cycles.
https://endoflife.date/centos

However most people i know just switched to other open downstream builds of RHEL. Either Alma Linux or Rocky Linux which still support v8 for many years to come:
https://endoflife.date/rocky-linux

Actions #5

Updated by gstrauss 10 months ago

I'll probably add a more proper fallback (and slightly larger patch), and will post a link here, but for the moment, lighttpd 1.4.71 is not imminent.

Actions #6

Updated by fstelzer 10 months ago

gstrauss wrote in #note-5:

I'll probably add a more proper fallback (and slightly larger patch), and will post a link here, but for the moment, lighttpd 1.4.71 is not imminent.

sounds good. thanks for your help!

Actions #7

Updated by gstrauss 10 months ago

  • Status changed from New to Patch Pending
  • Target version changed from 1.4.xx to 1.4.71

posix_spawnattr_setsigdefault() on CentOS Linux release 7.9.2009 (Core) is not tolerant of being passed a sigset_t initialized with sigfillset().
During the posix_spawn(), it fails with EINVAL when trying to set sigaction SIG_DFL for SIGKILL, SIGSTOP, and for signal 65, and presumably anything else higher in that set. Without verifying, my guess is that extra bits are set in the mask and the loop over the signals is not bounded by the range of valid signal numbers.

In any case, the code used on other platforms is more appropriate for a CentOS running linux kernel 3.10.0 and glibc 2.17.

It would appear that a future version of glibc (maybe glibc-2.38?) will fix the behavior I saw, and why I added that special-case for __linux__
Therefore, the next version of lighttpd will remove that special-case for __linux__.

On the glibc master branch:
https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=2053c11331991818882f7cf023ed2ce4ff44b274

The clone3 flag resets all signal handlers of the child not set to
SIG_IGN to SIG_DFL.  It allows to skip most of the sigaction calls
to setup child signal handling, where previously a posix_spawn
had to issue 2 times NSIG sigaction calls (one to obtain the current
disposition and another to set either SIG_DFL or SIG_IGN).

With POSIX_SPAWN_SETSIGDEF the child will setup the signal for the case
where the disposition is SIG_IGN.

The code must handle the fallback where clone3 is not available. This is
done by splitting __clone_internal_fallback from __clone_internal.

Actions #8

Updated by gstrauss 10 months ago

--- a/src/fdevent.c
+++ b/src/fdevent.c
@@ -516,14 +516,6 @@ pid_t fdevent_fork_execve(const char *name, char *argv[], char *envp[], int fdin
        #endif
         && 0 == (rc = sigemptyset(&sigs))
         && 0 == (rc = posix_spawnattr_setsigmask(&attr, &sigs))
-      #ifdef __linux__
-        /* linux appears to walk all signals and to query and preserve some
-         * sigaction flags even if setting to SIG_DFL, though if specified
-         * in posix_spawnattr_setsigdefault(), resets to SIG_DFL without query.
-         * Therefore, resetting all signals results in about 1/2 the syscalls.
-         * (FreeBSD appears more efficient.  Unverified on other platforms.) */
-        && 0 == (rc = sigfillset(&sigs))
-      #else
         /*(force reset signals to SIG_DFL if server.c set to SIG_IGN)*/
        #ifdef SIGTTOU
         && 0 == (rc = sigaddset(&sigs, SIGTTOU))
@@ -536,7 +528,6 @@ pid_t fdevent_fork_execve(const char *name, char *argv[], char *envp[], int fdin
        #endif
         && 0 == (rc = sigaddset(&sigs, SIGPIPE))
         && 0 == (rc = sigaddset(&sigs, SIGUSR1))
-      #endif
         && 0 == (rc = posix_spawnattr_setsigdefault(&attr, &sigs))) {

           #if defined(HAVE_POSIX_SPAWN_FILE_ACTIONS_ADDCLOSEFROM_NP) \
Actions #9

Updated by gstrauss 10 months ago

The recent 1.4.70 release seems to break "include_shell" configs for CentOS 7. I could not replicate the issue on Rocky 8 though and have no idea what the actual underlying problem is or why it affects one and not the other.

@fstelzer out of curiousity, what are the versions of glibc (or other libc, such as musl) on the systems where it works and where it does not work?

Actions #10

Updated by gstrauss 10 months ago

From a very cursory read through of glibc git source history, the behavior I observed looks like it may have been introduced in glibc-2.24, which included commit from 19 Jan 2016 9ff72da471a509a8c19791efe469f47fa6977410 "posix: New Linux posix_spawn{p} implementation".

Actions #11

Updated by gstrauss 10 months ago

This seems to work for me.

--- a/src/fdevent.c
+++ b/src/fdevent.c
@@ -516,12 +516,12 @@ pid_t fdevent_fork_execve(const char *name, char *argv[], char *envp[], int fdin
        #endif
         && 0 == (rc = sigemptyset(&sigs))
         && 0 == (rc = posix_spawnattr_setsigmask(&attr, &sigs))
-      #ifdef __linux__
-        /* linux appears to walk all signals and to query and preserve some
+      #if defined(__GLIBC__) \
+       && (__GLIBC__ == 2 && __GLIBC_MINOR__ >= 24 && __GLIBC_MINOR__ <= 37)
+        /* glibc appears to walk all signals and to query and preserve some
          * sigaction flags even if setting to SIG_DFL, though if specified
          * in posix_spawnattr_setsigdefault(), resets to SIG_DFL without query.
-         * Therefore, resetting all signals results in about 1/2 the syscalls.
-         * (FreeBSD appears more efficient.  Unverified on other platforms.) */
+         * Therefore, resetting all signals results in about 1/2 the syscalls.*/
         && 0 == (rc = sigfillset(&sigs))
       #else
         /*(force reset signals to SIG_DFL if server.c set to SIG_IGN)*/

Actions #12

Updated by fstelzer 10 months ago

gstrauss wrote in #note-11:

This seems to work for me.
[...]

Works for me as well.
RHEL 7 has glibc-2.17. Probably tons of redhat patches on top though.
Thanks for your help. I'll roll out a patched version to our test env.

Actions #13

Updated by gstrauss 10 months ago

  • Status changed from Patch Pending to Fixed
Actions #14

Updated by gstrauss 10 months ago

  • Related to Bug #3204: CentOS 7 issue with lighttpd 1.4.70 (or other platforms using glibc <= 2.23) added
Actions #15

Updated by gstrauss 10 months ago

  • Subject changed from include_shell not working on all platforms to include_shell not working on CentOS 7 (or other platforms using glibc <= 2.23)
Actions #16

Updated by gstrauss 10 months ago

  • Has duplicate Bug #3204: CentOS 7 issue with lighttpd 1.4.70 (or other platforms using glibc <= 2.23) added
Actions #17

Updated by gstrauss 10 months ago

  • Related to deleted (Bug #3204: CentOS 7 issue with lighttpd 1.4.70 (or other platforms using glibc <= 2.23))
Actions

Also available in: Atom