Project

General

Profile

[OT] Porting lighttpd to a new OS

Added by tamarnator over 6 years ago

I'm new to the forum, and apparently posted this to the wrong place originally. Here goes my second attempt.

I have been assigned to port code written several years ago, by an external contractor, to another OS. The original OS was ElinOS, with Linux 2.6.34.12, on an embedded system. The original system was running lighttpd 1.4.30, which I know is positively ancient.

Before proceeding, I need to explain 2 requirements:
1.) I need to move from ElinOS 2.6.34.12, to a buildRoot environment running exactly the same version of Linux.
2.) I need to port exactly the same versions of any SW on the original system, to the new system, without upgrades to any SW packages, including lighttpd.

This means I have to port lighttpd 1.4.30, without upgrading to the latest version. Personally, I would like to upgrade to the latest SW, but the customer forbids it. If questions about old versions of lighttpd are not entertained on this forum, please let me know, and I'll move on. If they are allowed, please read on.

The port has been progressing nicely, but I believe I encountered an issue with lighttpd. I'm looking for debugging suggestions. Perhaps I need a library or package that I have missed, or maybe I have issues with permissions.

The code in question is written in PHP, and is sending JSON encoded requests using Curl.

function list_connections()
        $data = array(
        'service' => 'org.freedesktop.NetworkManagerSystemSettings|/org/freedesktop/NetworkManagerSettings',
        'method' => 'org.freedesktop.NetworkManagerSettings.ListConnections',
        'id' => 0,
        'params' => array()
        );

        $json_data =json_encode($data);
        $result = curl_post("http://localhost/rpc", $json_dta);
        $result2 = json_decode($result);
        return $result2->result;
}

Here is the implementation of the curl_post() function, which is called above:

function curl_post($url, $post = NULL, array $options = array())
{
        $defaults = array(
                CURLOPT_POST => 1,
                CURLOPT_HEADER => 0,
                CURLOPT_URL => $url,
                CURLOPT_FRESH_CONNECT => 1,
                CURLOPT_RETURNTRANSFER => 1,
                CURLOPT_FORBID_REUSE => 1,
                CURLOPT_TIMEOUT => 4,
                CURLOPT_POSTFIELDS => $post
        );

        $ch = curl_init();
        curl_setopt_array($ch, ($options + $defaults));
        curl_setopt($ch, CURLOPT_HTTPHEADER, array("Expect:"));

        if ( ! $result = curl_exec($ch))
        {
                trigger_error(curl_error($ch));
        }
        curl_close($ch);
        return $result;
}

This code is all working under ElinOS. I believe the problem is that I am missing something in my buildRoot environment. I believe I have narrowed the issue down to the setup of the mod_fastcgi module in lighttpd.conf. Here is the code:


fastcgi.server = (
        "/rpc" => ((
                "bin-path" => "/usr/bin/json-dbus-bridge",
                "socket" => "/tmp/json-dbus-bridge.socket",
                "check-local" => "disable",
                "mode" => "responder",
                "max-procs" => 1,
        )),
)

The issue is that all requests that are sent to http://localhost/rpc are encountering the following error:

HTTP/1.1 500 Internal Server Error

My suspicion is that all requests sent to the /rpc prefix in the buildRoot environment are not being handled correctly. I'm wondering if lighttpd needs to be built with any particular options, if some directory or file permissions are incorrect, or if there are dependencies on other libraries or packages that I have not discovered yet.

I managed to capture the following output in the logs, when attempting to send a JSON request through Curl to lighttpd:

mod_fastcgi.c:1732 connect failed: Connection refused on unix:/tmp/json-dbus-bridge.socket.0
mod_fastcgi.c:3025 backend died: we'll disable it for 1 seconds and send the request to another backend instead: reconnects: 0 load: 1
mod_fastcgi.c:1103 the fastcgi-backend /usr/bin/json-dbus-bridge failed to start
mod_fastcgi.c: 1107 child existed with status 22 /usr/bin/json-dbus-bridge
mod_fastcgi.c 1110 if you're trying to run yhour app as a FastCGI backend, make sure if this is PHP on Gentoo, add fastcgi to the USE flags
mod_fastcgi.c:2842 ERROR: spawning fcgi failed

My suspicion is that all requests sent to the /rpc prefix in the buildRoot environment are not being handled correctly by the ported environment. This code all works on ElinOS. I'm wondering if lighttpd needs to be built with any particular options, if some directory or file permissions are incorrect, or if there are dependencies on other libraries or packages that I have not discovered yet. Any debugging suggestions are welcome. Currently, I am sprinkling debug statements throughout mod_fastcgi.c, but I am unfamiliar with the code base, and thought I would see if there are any experts that can point in the right direction.

Thanks in advance!


Replies (7)

RE: Porting lighttpd to a new OS - Added by avij over 6 years ago

What happens when you try to run that magical /usr/bin/json-dbus-bridge program from the command line, using the same user ID as your lighttpd is running as?

RE: Porting lighttpd to a new OS - Added by tamarnator over 6 years ago

When running from the command line, the user is root, and our current system does not include support for the 'su' command to switch users. lighttpd is running under the user name of www. I'll see if I can add support for switching user to www.

However, I did notice something curious. On the ElinOS system, which is working, if I type the 'ps' command, json-dbus-bridge shows up like this:

3751 www     4568 S    /sbin/lighttpd -f /etc/lighttpd.conf
3752 www     2648 S    /usr/bin/json-dbus-bridge

On the buildRoot system, which is failing, the 'ps' command shows this:

299 www      /sbin/lighttpd -f /etc/lighttpd.conf
301 www      [json-dbus-bridg]

The interesting thing is that for buildRoot, the json-dbus-bridge appears in brackets. I did some reading to figure what the difference is. I saw one page that indicated the brackets appear around command names when the arguments to that command cannot be located, as is often the case for system processes, or kernel threads.

I'm not sure if this has anything to do with how json-dbus-bridge is being invoked by fastcgi.

One other item I found has to do with how the dbus-daemon starts. On the ElinOS system, I can type json-dbus-bridge at a command prompt, as user root, and it will start up successfully. On the buildRoot system, I get the following error, when starting as root on the command line:


bridge_init failed, couldn't connect to dbus: org.freedesktop.DBus.Error.NotSpported: Using X11 for dbus-dawmon autolaunch was disabled at compile time, set your DBUS_SESSION_BUS_ADDRESS instead

Some reading suggested running dbus-launch, which sets DBUS_SESSION_BUS_ADDRESS, and then exporting it's value, as follows:


# dbus-launch
DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-34pf7mDs9b,guid=cd02807bc11a32f15a3e70e200000a53
DBUS_SESSION_BUS_PID=9058

# export DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-34pf7mDs9b,guid=cd02807bc11a32f15a3e70e200000a53

When I do this, I can start json-dbus-bridge from the command line, but it doesn't seem to associate with lighttpd. I suspect this is because it wasn't started by lighttpd, and the pid is wrong. I'm wondering if there is a way to run the dbus-launch command and export DBUS_SESSION_BUS_ADDRESS from within the fastcgi block in lighttpd.conf. I see fastcgi.server-options like bin-environment and bin-copy-environment, but I don't see how I could use these to run dbus-launch, and then export the resulting DBUS_SESSION_BUS_ADDRESS. Is there another method to do this?

I suspect the interaction between json-dbus-bridge and lighttpd is probably my root problem, but don't have a solution yet.

RE: Porting lighttpd to a new OS - Added by avij over 6 years ago

I'm not sure I agree with your conclusion, but apparently you can let dbus-launch execute other programs, letting dbus-launch set the environment variables. Therefore "bin-path" => "/usr/bin/dbus-launch /usr/bin/json-dbus-bridge" might set the environment variables as you want.

https://dbus.freedesktop.org/doc/dbus-launch.1.html

RE: Porting lighttpd to a new OS - Added by gstrauss over 6 years ago

The original system was running lighttpd 1.4.30, which I know is positively ancient.
[...]
This means I have to port lighttpd 1.4.30, without upgrading to the latest version.
Personally, I would like to upgrade to the latest SW, but the customer forbids it.

Even if your final solution has to be lighttpd 1.4.30, you probably should be testing with lighttpd 1.4.45 to get things working, and then backporting to lighttpd 1.4.30, if that is what the customer insists after you explain the security fixes that have been applied between lighttpd 1.4.30 and 1.4.45 over the course of 6 years.

mod_fastcgi.c:1103 the fastcgi-backend /usr/bin/json-dbus-bridge failed to start

As avij also noted, this is your problem.

Is dbus-daemon running on the old box? The new box? Is the dbus-core or similar package even installed? Also check the permissions on ~www/.dbus

RE: Porting lighttpd to a new OS - Added by tamarnator over 6 years ago

I'm trying out your suggestion, of moving to 1.4.45 for debugging purposes, and then I can back up to 1.4.30.

Just a little side bar about why the customer is so adamant that they stick with 1.4.30. The customer's device is a medical device, and there is an overwhelming amount of work and money required to get a device approved by the FDA. Any changes to software require another submission to the FDA for re-approval, which can go easily to millions of dollars, and months of time. Hence the strong desire to stick with the current software even though it is old. Turns out to be pretty self defeating, because you don't get the benefit of all the upgrades and bug fixes done to SW over time. Now back to the issue at hand.

The following information is showing up in the log with the new lighttpd version:

2017-09-21 16:13:10 UTC(+0000) [lighttpd] (mod_fastcgi.c.3574) child exited: 22 unix:/tmp/json-dbus-bridge.socket-0
2017-09-21 16:13:10 UTC(+0000) [lighttpd] (mod_fastcgi.c.1159) the fastcgi-backend /usr/bin/json-dbus-bridge failed to start:
2017-09-21 16:13:10 UTC(+0000) [lighttpd] (mod_fastcgi.c.1163) child exited with status 22 /usr/bin/json-dbus-bridge
2017-09-21 16:13:10 UTC(+0000) [lighttpd] (mod_fastcgi.c.1166) If you're trying to run your app as a FastCGI backend, make sure you'
2017-09-21 16:13:10 UTC(+0000) [lighttpd] (mod_fastcgi.c.2676) ERROR: spawning fcgi failed.

The difference is that json-dbus-bridge does not show up now, when I do a ps command, the way it did before. Still points to some issue with json-dbus-bridge. I modified my environment on the buildRoot board, so that I can switch user to www, instead of root, and attempted running json-dbus-bridge, as avij suggested. It fails to start, and gives this error message:

~ # json-dbus-bridge
bridge_init failedcouldn't connect to dbus: org.freedesktop.DBus.Error.NotSupported: Using X11 for dbus-daemon autolaunch was disabled at compile time, set your DBUS_SESSION_BUS_ADDRESS instead
~ #

I did some reading on "Linux From Scratch" about setting up and configuring the dbus. I found a recommendation to run these two commands, early during startup scripts:

~ # eval `dbus-launch`
~ # export DBUS_SESSION_BUS_ADDRESS

After running these two commands, I am able to enter json-dbus-bridge at a prompt, without getting error messages. The question was also asked if there is a dbus-daemon running, and there is. It is started as follows:

/usr/bin/dbus-daemon --system

What I am wondering is what I can do to ensure that when lighttpd tries to start json-dbus-bridge, that it uses the DBUS_SESSION_BUS_ADDRESS created by dbus-launch.

I looked at adding bin-copy-environment to pass in the value of DBUS_SESSION_BUS_ADDRESS created by dbus-launch. I tried the following change in lighttpd.conf, but to no avail:

fastcgi.server = (
        "/rpc" => ((
                "bin-path" => "/usr/bin/json-dbus-bridge",
                "bin-copy-environment" => (
                        "DBUS_SESSION_BUS_ADDRESS" )
                "socket" => "/tmp/json-dbus-bridge.socket",
                "check-local" => "disable",
                "mode" => "responder",
                "max-procs" => 1,
        )),
)


Any suggestions how to get DBUS_SESSION_BUS_ADDRESS, from the shell environment, recognized from inside lighttpd?

RE: Porting lighttpd to a new OS - Added by avij over 6 years ago

avij wrote:

[...] but apparently you can let dbus-launch execute other programs, letting dbus-launch set the environment variables. Therefore "bin-path" => "/usr/bin/dbus-launch /usr/bin/json-dbus-bridge" might set the environment variables as you want.

https://dbus.freedesktop.org/doc/dbus-launch.1.html

RE: Porting lighttpd to a new OS - Added by gstrauss over 6 years ago

(avij has already pointed you to dbus documentation since it does not appear to be running properly on the new device environment, and does not appear to be a lighttpd issue)

Off-topic response to your customer:

Just a little side bar about why the customer is so adamant that they stick with 1.4.30. The customer's device is a medical device, and there is an overwhelming amount of work and money required to get a device approved by the FDA. Any changes to software require another submission to the FDA for re-approval, which can go easily to millions of dollars, and months of time. Hence the strong desire to stick with the current software even though it is old. Turns out to be pretty self defeating, because you don't get the benefit of all the upgrades and bug fixes done to SW over time.

The device manufacturer and provider also miss out on any security fixes, which your customer should be interested in, unless they want consumers suing them out of business if the device causes harm to customers. Yes, there is a time cost to recertifying software. However, your customer apparently hasn't looked into the process for a while (6+ years?) The FDA is also aware of how important it is to incorporate security fixes into products.

    (1-7/7)