Project

General

Profile

Bug #2586

[Solved] Memory leak when using readfile() in PHP over FastCGI

Added by joepie91 about 4 years ago. Updated 3 months ago.

Status:
Invalid
Priority:
Normal
Assignee:
-
Category:
core
Target version:
-
Start date:
2014-07-24
Due date:
% Done:

0%

Estimated time:
Missing in 1.5.x:

Description

On a site of mine, I experienced a severe memory leak in lighttpd when using the readfile() function in PHP. Any time a significant (~140MB) file was sent out using readfile(), lighttpd would take up a chunk of memory of approximately that size and never release it, leading to having to restart lighttpd frequently. I am unsure whether this also happened for small files. lighttpd appears to 'buffer up' the data internally, and simply never release the memory. The VPS would run out of memory and start swapping in a matter of minutes under any kind of traffic load.

The issue occurred on a Debian 7 (Wheezy) x64 KVM VPS, both on 1.4.31-4+deb7u3 (Debian Wheezy binary package) and 1.4.35-2 (Debian Sid unstable source package). As far as I can tell, this is an unsolved issue. I am unsure whether this is an issue in the lighttpd core or in the fastcgi module, but I've marked it as core high-priority, due to the severity of the leak.

I applied the following workaround (disabling the readfile call and manually sending chunks like it's a range request until the connection is closed, instead): https://github.com/joepie91/pdfy/commit/6f3359d2220fe376c81f1674fb52ef670247beeb

My lighttpd/PHP configuration:

fastcgi.server += ( ".php" =>
((
"bin-path" => "/usr/bin/php-cgi",
"socket" => "/tmp/php.socket"
))
)

History

#1

Updated by stbuehler about 4 years ago

  • Status changed from New to Invalid
  • Priority changed from High to Normal

It's not a leak, it is going to be reused (memory management is a complicated issue; "memory fragmentation" is probably the issue here).

Use X-Sendfile.

#2

Updated by joepie91 about 4 years ago

The memory was not reused, even if the requesting client had long disconnected or if the server was about to run out of RAM - it would simply OOM after a while. The memory usage never dropped, only rose. That certainly sounds like a memory leak to me.

Sendfile is not used because the code is meant to be HTTPd-independent. Not all HTTPds support X-Sendfile headers. Additionally, the [[http://redmine.lighttpd.net/projects/1/wiki/X-LIGHTTPD-send-file page on X-Sendfile support in lighttpd]] states that an older 1.4 version does not support range requests - which are an absolute requirement for my use case - and that the support of this in 1.5 is unknown. That leads me to believe it is not implemented in the versions I use either.

#3

Updated by joepie91 about 4 years ago

For clarification: the typical situation here is that a request for the large file is made and fulfilled using readfile, but aborted by the client (pdf.js) straight-away (due to the client realizing from the headers that it can do range requests instead). It may very well be an issue that only occurs for aborted requests.

#4

Updated by Olaf-van-der-Spek about 4 years ago

joepie91 wrote:
That leads me to believe it is not implemented in the versions I use either.

What versions do you use?

#5

Updated by joepie91 about 4 years ago

From my initial post:

The issue occurred on a Debian 7 (Wheezy) x64 KVM VPS, both on 1.4.31-4+deb7u3 (Debian Wheezy binary package) and 1.4.35-2 (Debian Sid unstable source package).

#6

Updated by darix about 4 years ago

Just use X-LIGHTTPD-send-file, readfile() is just stupid.

And lighttpd will reuse the memory ... but if you do this multiple time in parallel ... it will of course require multiple times the memory until you run into OOM.

#7

Updated by acleanslate about 4 years ago

  • Status changed from Invalid to Reopened

The fact it is failing to call free() on the memory after a client session closes clearly shows that it is not going to reuse the memory.

This is a textbook memory leak. If you are going to reuse some memory, you must clear it first (call free() on it, f. ex), not just let it sit idly and consume memory.

Apologies for repeating that statement a couple of times, but I wish to really drive the point across that if you intend to reuse memory, you have to clear it. You cannot just blindly hope it gets overwritten later on with new stuff (and hoping so is moronic - say hello to a potential information leak vulnerability! As if lighttpd could do with a few more CVE's to its name!), you gotta clear it. And remember, only call free() on it once - you don't want to go introducing a double free vulnerability either. Or reusing that memory at the wrong time and introducing, say, a use after free... Manipulate thine memory safely.

Furthermore, the developers dismissive attitude towards their users who do try contribute by reporting issues (i.e. "no do it our way, we will completely disregard your explaination for why what we suggest won't work because you are just a user) shows a systemic failure on the behalf of the developers to accept user input and handle it in a proper manner.

In conclusion: Once memory is done with, clear it. Use some free(). Not doing so is a memory leak, and will cause performance issues or potentially other issues down the line.

#8

Updated by darix about 4 years ago

  • Status changed from Reopened to Invalid
  1. free() does not clear the memory either. it just released it back to the OS
  2. lighttpd is properly reusing the memory. we just cant help when you have multiple parallel requests which send lots of data via readfile(). 100 requests using 100mb memory will mean 10G memory usage.
  3. X-LIGHTTPD-send-file is a way better alternative to just streaming out the big file via readfile(). X-Sendfile2 even allows you range responses.

there is no good reason to use readfile in any case.

#9

Updated by joepie91 about 4 years ago

darix wrote:

1. free() does not clear the memory either. it just released it back to the OS

How is it released back to the OS, when htop explicitly lists lighttpd as still using it?

darix wrote:

2. lighttpd is properly reusing the memory. we just cant help when you have multiple parallel requests which send lots of data via readfile(). 100 requests using 100mb memory will mean 10G memory usage.

I am not talking about parallel requests. I am talking about subsequent requests, where the memory is not released after the client has already disconnected. I'm not sure what is so hard to understand about this. The memory usage never ever goes down, it only goes up, as I have pointed out several times by now.

darix wrote:

3. X-LIGHTTPD-send-file is a way better alternative to just streaming out the big file via readfile(). X-Sendfile2 even allows you range responses.

there is no good reason to use readfile in any case.

I already pointed out that this is undesirable for HTTPd compatibility reasons. There is no reason why lighttpd should be unable to deal with readfile while other HTTPds are apparently not having any issues with it.

#10

Updated by darix about 4 years ago

joepie91 wrote:

darix wrote:

1. free() does not clear the memory either. it just released it back to the OS

How is it released back to the OS, when htop explicitly lists lighttpd as still using it?

on shutdown. try it if you dont believe me. i would really wonder if valgrind reports any leaks.

darix wrote:

2. lighttpd is properly reusing the memory. we just cant help when you have multiple parallel requests which send lots of data via readfile(). 100 requests using 100mb memory will mean 10G memory usage.

I am not talking about parallel requests. I am talking about subsequent requests, where the memory is not released after the client has already disconnected. I'm not sure what is so hard to understand about this. The memory usage never ever goes down, it only goes up, as I have pointed out several times by now.

yes. we dont really release much memory during normal run time.

darix wrote:

3. X-LIGHTTPD-send-file is a way better alternative to just streaming out the big file via readfile(). X-Sendfile2 even allows you range responses.

there is no good reason to use readfile in any case.

I already pointed out that this is undesirable for HTTPd compatibility reasons. There is no reason why lighttpd should be unable to deal with readfile while other HTTPds are apparently not having any issues with it.

make it a config option '$supports_x_sendfile2' and '$supports_x_sendfile'.

  1. nginx supports it http://wiki.nginx.org/XSendfile
  2. apache2 supports it via https://tn123.org/mod_xsendfile/
#11

Updated by Stekkz about 4 years ago

darix wrote:

joepie91 wrote:

darix wrote:

1. free() does not clear the memory either. it just released it back to the OS

How is it released back to the OS, when htop explicitly lists lighttpd as still using it?

on shutdown. try it if you dont believe me. i would really wonder if valgrind reports any leaks.

darix wrote:

2. lighttpd is properly reusing the memory. we just cant help when you have multiple parallel requests which send lots of data via readfile(). 100 requests using 100mb memory will mean 10G memory usage.

I am not talking about parallel requests. I am talking about subsequent requests, where the memory is not released after the client has already disconnected. I'm not sure what is so hard to understand about this. The memory usage never ever goes down, it only goes up, as I have pointed out several times by now.

yes. we dont really release much memory during normal run time.

darix wrote:

3. X-LIGHTTPD-send-file is a way better alternative to just streaming out the big file via readfile(). X-Sendfile2 even allows you range responses.

there is no good reason to use readfile in any case.

I already pointed out that this is undesirable for HTTPd compatibility reasons. There is no reason why lighttpd should be unable to deal with readfile while other HTTPds are apparently not having any issues with it.

make it a config option '$supports_x_sendfile2' and '$supports_x_sendfile'.

  1. nginx supports it http://wiki.nginx.org/XSendfile
  2. apache2 supports it via https://tn123.org/mod_xsendfile/

So if I understand correctly, you wouldn't release memory back to the OS unless you'd shutdown the server?
In what possible way is that productive? Imagine a server with some thousand concurrent user, and everytime the memory is full, you have to restart the server...

#12

Updated by joepie91 about 4 years ago

darix wrote:

joepie91 wrote:

darix wrote:

1. free() does not clear the memory either. it just released it back to the OS

How is it released back to the OS, when htop explicitly lists lighttpd as still using it?

on shutdown. try it if you dont believe me. i would really wonder if valgrind reports any leaks.

Sorry, what? It's not a memory leak because the memory is given back to the OS when lighttpd is shut down? By that definition, memory leaks don't even exist.

lighttpd is keeping RAM in use for storing data that it will never need again. That is pretty much the definition of a memory leak, and it means that lighttpd is not giving back RAM to the OS when it should be doing so.

I also don't really see what Valgrind has to do with this. It's keeping significant amounts of memory in use indefinitely for no good reason - that is a problem that needs solving, whether Valgrind thinks it's a memory leak or not.

darix wrote:

2. lighttpd is properly reusing the memory. we just cant help when you have multiple parallel requests which send lots of data via readfile(). 100 requests using 100mb memory will mean 10G memory usage.

I am not talking about parallel requests. I am talking about subsequent requests, where the memory is not released after the client has already disconnected. I'm not sure what is so hard to understand about this. The memory usage never ever goes down, it only goes up, as I have pointed out several times by now.

yes. we dont really release much memory during normal run time.

Because?

Isn't that basically the whole issue that acleanslate tried to point out to you?

darix wrote:

3. X-LIGHTTPD-send-file is a way better alternative to just streaming out the big file via readfile(). X-Sendfile2 even allows you range responses.

there is no good reason to use readfile in any case.

I already pointed out that this is undesirable for HTTPd compatibility reasons. There is no reason why lighttpd should be unable to deal with readfile while other HTTPds are apparently not having any issues with it.

make it a config option '$supports_x_sendfile2' and '$supports_x_sendfile'.

  1. nginx supports it http://wiki.nginx.org/XSendfile
  2. apache2 supports it via https://tn123.org/mod_xsendfile/

Which involves - apparently - several different implementations, all of which need to be maintained and configured, all just to deal with a bug in lighttpd?

#13

Updated by darix about 4 years ago

joepie91 wrote:

darix wrote:

joepie91 wrote:

darix wrote:

1. free() does not clear the memory either. it just released it back to the OS

How is it released back to the OS, when htop explicitly lists lighttpd as still using it?

on shutdown. try it if you dont believe me. i would really wonder if valgrind reports any leaks.

Sorry, what? It's not a memory leak because the memory is given back to the OS when lighttpd is shut down? By that definition, memory leaks don't even exist.

lighttpd is keeping RAM in use for storing data that it will never need again. That is pretty much the definition of a memory leak, and it means that lighttpd is not giving back RAM to the OS when it should be doing so.

I also don't really see what Valgrind has to do with this. It's keeping significant amounts of memory in use indefinitely for no good reason - that is a problem that needs solving, whether Valgrind thinks it's a memory leak or not.

darix wrote:

2. lighttpd is properly reusing the memory. we just cant help when you have multiple parallel requests which send lots of data via readfile(). 100 requests using 100mb memory will mean 10G memory usage.

I am not talking about parallel requests. I am talking about subsequent requests, where the memory is not released after the client has already disconnected. I'm not sure what is so hard to understand about this. The memory usage never ever goes down, it only goes up, as I have pointed out several times by now.

yes. we dont really release much memory during normal run time.

Because?

Isn't that basically the whole issue that acleanslate tried to point out to you?

memory leak means ... "never properly freed. not even at the end of the program"
as we reuse the buffers, we get larger more continuous memory regions for us and don't fragment memory too badly.

#14

Updated by joepie91 about 4 years ago

darix wrote:

joepie91 wrote:

darix wrote:

joepie91 wrote:

darix wrote:

1. free() does not clear the memory either. it just released it back to the OS

How is it released back to the OS, when htop explicitly lists lighttpd as still using it?

on shutdown. try it if you dont believe me. i would really wonder if valgrind reports any leaks.

Sorry, what? It's not a memory leak because the memory is given back to the OS when lighttpd is shut down? By that definition, memory leaks don't even exist.

lighttpd is keeping RAM in use for storing data that it will never need again. That is pretty much the definition of a memory leak, and it means that lighttpd is not giving back RAM to the OS when it should be doing so.

I also don't really see what Valgrind has to do with this. It's keeping significant amounts of memory in use indefinitely for no good reason - that is a problem that needs solving, whether Valgrind thinks it's a memory leak or not.

darix wrote:

2. lighttpd is properly reusing the memory. we just cant help when you have multiple parallel requests which send lots of data via readfile(). 100 requests using 100mb memory will mean 10G memory usage.

I am not talking about parallel requests. I am talking about subsequent requests, where the memory is not released after the client has already disconnected. I'm not sure what is so hard to understand about this. The memory usage never ever goes down, it only goes up, as I have pointed out several times by now.

yes. we dont really release much memory during normal run time.

Because?

Isn't that basically the whole issue that acleanslate tried to point out to you?

memory leak means ... "never properly freed. not even at the end of the program"

No, it doesn't.

as we reuse the buffers, we get larger more continuous memory regions for us and don't fragment memory too badly.

Okay, and why does it then keep reallocating new RAM, rather than using the RAM it already had allocated - that it evidently no longer needs, since the client for those particular chunks of data doesn't exist anymore?

#15

Updated by Stekkz about 4 years ago

darix wrote:

joepie91 wrote:

darix wrote:

joepie91 wrote:

darix wrote:

1. free() does not clear the memory either. it just released it back to the OS

How is it released back to the OS, when htop explicitly lists lighttpd as still using it?

on shutdown. try it if you dont believe me. i would really wonder if valgrind reports any leaks.

Sorry, what? It's not a memory leak because the memory is given back to the OS when lighttpd is shut down? By that definition, memory leaks don't even exist.

lighttpd is keeping RAM in use for storing data that it will never need again. That is pretty much the definition of a memory leak, and it means that lighttpd is not giving back RAM to the OS when it should be doing so.

I also don't really see what Valgrind has to do with this. It's keeping significant amounts of memory in use indefinitely for no good reason - that is a problem that needs solving, whether Valgrind thinks it's a memory leak or not.

darix wrote:

2. lighttpd is properly reusing the memory. we just cant help when you have multiple parallel requests which send lots of data via readfile(). 100 requests using 100mb memory will mean 10G memory usage.

I am not talking about parallel requests. I am talking about subsequent requests, where the memory is not released after the client has already disconnected. I'm not sure what is so hard to understand about this. The memory usage never ever goes down, it only goes up, as I have pointed out several times by now.

yes. we dont really release much memory during normal run time.

Because?

Isn't that basically the whole issue that acleanslate tried to point out to you?

memory leak means ... "never properly freed. not even at the end of the program"
as we reuse the buffers, we get larger more continuous memory regions for us and don't fragment memory too badly.

You're wrong there. But based on your definition of memory leak, you would have a memory leak as you are not properly freeing it.
If I read your second sentence correctly, that means that your continuous memory regions get larger and larger, which will eventually fill out your entire memory?

#16

Updated by Stekkz about 4 years ago

  • Status changed from Invalid to Reopened
#17

Updated by darix about 4 years ago

  • Status changed from Reopened to Invalid
#18

Updated by stbuehler about 4 years ago

  • Please respect the choices of members of the development team; an issue "reopening" (flame) war isn't going to get you anywhere. While I'm at it: just leave the priority selection to us too.
  • free() is not necessarily returning memory to the operating system; that is what the term "memory fragmentation" is about.
  • lighttpds memory handling might not be optimal; in some places it keeps certain structs for later reuse (to avoid more fragmentation; but it is hard to measure whether this is successful). In any case: if you can show us that lighttpd actually never reuses such memory, then I will agree it is a memory leak. The easiest way usually is to show that we don't have the pointer anymore (and valgrind can find those), but there are other possibilities.
  • Yes, the high memory usage is not what we like it to be; but this is due to some design choices (trying to read the response from the backend as fast as possible so it can handle the next request), and we're not going to throw those away now in the stable series. We are still working on the "2.0" development branch (with a more flexible design), which might work better in some cases (although I recently got reports about high memory usage I couldn't track down yet - after all, it is a "development" branch and not released yet).
#19

Updated by joepie91 about 4 years ago

lighttpds memory handling might not be optimal; in some places it keeps certain structs for later reuse (to avoid more fragmentation; but it is hard to measure whether this is successful). In any case: if you can show us that lighttpd actually never reuses such memory, then I will agree it is a memory leak. The easiest way usually is to show that we don't have the pointer anymore (and valgrind can find those), but there are other possibilities.

You're asking a web developer to provide technical proof and debugging data of a C application. I don't write C - if I were competent enough at C to provide the data you're looking for, I'd be submitting a patch, not a bugticket.

Yes, the high memory usage is not what we like it to be; but this is due to some design choices (trying to read the response from the backend as fast as possible so it can handle the next request), and we're not going to throw those away now in the stable series. We are still working on the "2.0" development branch (with a more flexible design), which might work better in some cases (although I recently got reports about high memory usage I couldn't track down yet - after all, it is a "development" branch and not released yet).

I don't necessarily disagree with this design choice. I take issue with it allocating new memory when there should be previously allocated memory available for use - whether it's freed to the OS or not is not relevant. If more RAM is allocated on each subsequent request to a file - while I can verify using the access log and lsof that no concurrent requests are taking place - then that is a problem. Upon premature disconnection of a client it should be throwing away the response data it had left, and reuse that memory for something else. It does not appear to be doing that.

#20

Updated by BadineKouly 3 months ago

  • File error.png added
#21

Updated by gstrauss 3 months ago

  • File deleted (error.png)
#22

Updated by gstrauss 3 months ago

  • Subject changed from Memory leak when using readfile() in PHP over FastCGI to [Solved] Memory leak when using readfile() in PHP over FastCGI

This was fixed in lighttpd 1.4.40, release July 2016. More details in commit 5a91fd4b

Also available in: Atom