fastcgi performance problem with large responses
When the fastcgi code reads a large chunk of response data from the network in one read in creates a single chunk and then calls fastcgi_get_packet() repeatedly to extract packets from it.
The first thing that routine does is to create a buffer and copy entire chunks to it until it has at least 8 bytes (for the fcgi packet header) after which it gets the packet length from the head and (if necessary) copies more data from the remaining chunks until it has the whole packet, or discards data from the buffer if it had obtained too much when locating the header.
This is wasteful, and the bigger the chunk is the worse it gets. If you have a megabyte chunk that came from the network in one read then you keep creating a buffer and copying large amounts of data into it even though most of it will be discarded as the maximum fcgi packet length is 64Kb.
The attached patch changes the first loop in fastcgi_get_packet() to only copy enough bytes from the chunk to allow the header to be decoded. The second loop then continues as before and extracts the rest of the packet.
Also available in: Atom