Project

General

Profile

Actions

Bug #3137

closed

TRACEME environment option in tests broken with LISTEN_PID

Added by stbuehler about 3 years ago. Updated about 3 years ago.

Status:
Fixed
Priority:
Normal
Category:
test_component
Target version:
ASK QUESTIONS IN Forums:
No

Description

5b1b9f78247e122804490b17112e94df32851464 switched to using (systemd) socket activation to pass an ephemeral listening port to the test instance.

For this, LISTEN_PID needs to be set to the process id the test instance is running as.

But the TRACEME variants all call some wrapper program (gdb, strace, valgrind, ...), which doesn't update LISTEN_PID to the actual (sub) process id of the spawned instance; the nested process id also can't be predicted (for me adding 10 worked with gdb most of the time...).

The tests will then fail with something like this:

... lighttpd1.4/src/network.c.419) can't bind to socket: 0.0.0.0:80: Permission denied

  • There should be a way to require systemd socket activation in the config (instead of falling back to bind :80), and it should be used in the test configs
  • lighttpd should allow a special LISTEN_PID value (like "ignore") to skip the pid test; this should be used to spawn the test instance if TRACEME is active. (Maybe it would be possible to use something like LISTEND_PID=parent=xxx to have it match the parent process id; but I'm not sure if the TRACEME programs only add a single process layer in the tree.)
Actions #1

Updated by gstrauss about 3 years ago

  • Description updated (diff)
  • Category set to test_component
Actions #2

Updated by gstrauss about 3 years ago

Those ideas are good, and I am considering others.

On (native) Windows, I may need a different solution to socket activation.

It is possible that for testing, a lighttpd.conf.$test is created dynamically, and might incorporate LISTEN_PID=TRACEME or something like that to skip the test and reset the variable to getpid().

Actions #3

Updated by gstrauss about 3 years ago

My understanding is that the LISTEN_PID is a sanity check.
I believe that getppid() is fairly portable (and part of the POSIX.1-2001 standard), so this might be sufficient for (most) tracing.

--- a/src/network.c
+++ b/src/network.c
@@ -581,7 +581,7 @@ static int network_socket_activation_from_env(server *srv, network_socket_config
     char *listen_fds = getenv("LISTEN_FDS");
     pid_t lpid = listen_pid ? (pid_t)strtoul(listen_pid,NULL,10) : 0;
     int nfds = listen_fds ? atoi(listen_fds) : 0;
-    int rc = (lpid == getpid() && nfds > 0 && nfds < 5000)
+    int rc = ((lpid==getpid() || lpid==getppid()) && nfds > 0 && nfds < 5000)
       ? network_socket_activation_nfds(srv, s, nfds)
       : 0;
     unsetenv("LISTEN_PID");

Actions #4

Updated by stbuehler about 3 years ago

It certainly is a sanity check, and most certainly exactly against processes forwarding env and fds to nested processes. So allowing ppid to match by default is not a good idea imho.

Actions #5

Updated by gstrauss about 3 years ago

It certainly is a sanity check, and most certainly exactly against processes forwarding env and fds to nested processes. So allowing ppid to match by default is not a good idea imho.

Good point. [Edit: although if using the lighttpd server and these variables are set, it is likely that the lighttpd server is the target of systemd socket activation)]

What do you think of combining the getppid check along with requiring the existence of TRACEME environment variable?
(lpid == getppid() && NULL != getenv("TRACEME"))

Actions #6

Updated by stbuehler about 3 years ago

gstrauss wrote in #note-5:

It certainly is a sanity check, and most certainly exactly against processes forwarding env and fds to nested processes. So allowing ppid to match by default is not a good idea imho.

Good point. [Edit: although if using the lighttpd server and these variables are set, it is likely that the lighttpd server is the target of systemd socket activation)]

Either lighttpd was the target and we don't need to check LISTEN_PID in the first place, or we want a safe-guard if lighttpd wasn't actually the target, then we really shouldn't allow ppid.
The probability that lighttpd was the target has nothing to do with whether allowing ppid is a good idea (only whether checking LISTEN_PID is needed).

What do you think of combining the getppid check along with requiring the existence of TRACEME environment variable?
(lpid == getppid() && NULL != getenv("TRACEME"))

a) whether ppid actually matches (for all "TRACME" tools) is still not certain, right?
b) checking TRACEME seems very specific to me; accepting a LISTEND_PID=any (or LISTEND_PID=parent if you really want to go for ppid matching) or similar still sounds like a better solution to me

Apart from that I suppose it would work.

Actions #7

Updated by gstrauss about 3 years ago

a) whether ppid actually matches (for all "TRACME" tools) is still not certain, right?

In a test on Linux, valgrind preserve the original process pid.
In a test on Linux, strace and gdb change the process pid, and LISTEN_PID matches parent pid.
In a test on FreeBSD, truss changes the process pid, and LISTEN_PID matches parent pid.

My hesitancy in extending LISTEN_PID stems from its origin: systemd socket activation. Who is to say when systemd might extend it?
Then again, this is for a feature of lighttpd testing, and LISTEND_PID=parent presently seems like a reasonable solution.

Actions #8

Updated by gstrauss about 3 years ago

Well, LISTEN_PID=parent does not specify the pid expected to match.
I suppose I could use something like LISTEN_PID=parent:<pid>

Actions #9

Updated by gstrauss about 3 years ago

  • Status changed from New to Fixed
Actions #10

Updated by gstrauss about 3 years ago

  • Target version changed from 1.4.xx to 1.4.64
Actions

Also available in: Atom