Feature #2139
closedOptimizing buffer_append_long()
Description
lighttpd's internal buffer datatype has different functions in order to write data into the char array behind the scenes. One of them is buffer_append_long() which is as the name suggests, a function to append an integer to the buffer by using the internal function LI_ltostr(). The patch changes the calculation a bit to be able to deliver the result faster. I've made quite a few benchmarks for this as I needed a similar function in another program. And this little change improves the performance by a few percent - why should we give it away?
Files
Updated by stbuehler about 15 years ago
I had a look at the patch:
I think the compiler should be able to optimize the use of "x = val % 10, y = val / 10" - perhaps it doesn't, but i don't like optimizing it myself (on x86 the "DIV" command calculates both values anyway).
The swap optimization is even uglier - you could make the swap variable local to the loop, but i doubt XOR is faster than swapping with a temporary var (did you actually use "-O2" for benchmarking it?).
Updated by crypt about 15 years ago
- Assignee set to stbuehler
Sorry for my late response. The XOR-swap was just an attempt to remove the temporary variable. I think on most common architectures the way over a third variable should be imperceptibly slower than the way over bit manipulation. Under the few of readability you should leave this part.
But as you may noticed, the slow DIV is called twice even with -O23 and my optimization by rewriting the formula at this point makes the biggest performance benefit of the patch.
Ultimately, it's your decision, and there are surely many other points where divisions could be optimized in lighttpd, but as this function is called many times per request, I thought it could be a little improvement.
Updated by stbuehler about 15 years ago
I don't know who told you that, but it is wrong.
Example (i'm on x86_64):
You will see, that the swap with temporary var is definitely faster; temporary vars are almost always held in registers; and for xor you need to copy them into a register anyway (the compiler adds temporary vars for every calculation step anyway).
In the case with div/mod: if you don't know the divisor, using / and % is faster (only one idiv); if the divisor is 10, the resulting code is pretty much the same (just not really readable, gcc uses some magic foo here :D), so it doesn't matter.
Now, some of these things may behave differently in loops; perhaps the compiler can somehow optimize some variants better than others; but this very much depends on the compiler you are using, and so i prefer to use the clean way, which compilers should be able to optimize way better.
Perhaps you want to read http://dl.fefe.de/optimizer-isec.pdf
void swap1(char *a, char *b) { char tmp = *a; *a = *b; *b = tmp; } void swap2(char *a, char *b) { *a ^= *b ^= *a ^= *b; }
gcc -S -O3 test_swaps.c
swap1: movzbl (%rdi), %edx movzbl (%rsi), %eax movb %al, (%rdi) movb %dl, (%rsi) ret swap2: movzbl (%rsi), %eax movl %eax, %edx xorb (%rdi), %dl xorl %edx, %eax movb %al, (%rsi) xorl %edx, %eax movb %al, (%rdi) ret
And here the results for div+mod:
int a, b; void test_divmod1(int x, int y) { a = x % y; b = x / y; } void test_divmod2(int x, int y) { a = x - (b = x / y) * x; } void test_divmod3(int x) { a = x % 10; b = x / 10; } void test_divmod4(int x) { a = x - (b = x / 10) * 10; }
test_divmod1: movl %edi, %edx movl %edi, %eax sarl $31, %edx idivl %esi movl %edx, a(%rip) movl %eax, b(%rip) ret test_divmod2: movl %edi, %edx movl %edi, %eax sarl $31, %edx idivl %esi movl %eax, %ecx movl %eax, b(%rip) movl $1, %eax subl %ecx, %eax imull %edi, %eax movl %eax, a(%rip) ret test_divmod3: movl %edi, %eax movl $1717986919, %edx imull %edx movl %edi, %eax sarl $31, %eax sarl $2, %edx subl %eax, %edx leal (%rdx,%rdx,4), %eax movl %edx, b(%rip) addl %eax, %eax subl %eax, %edi movl %edi, a(%rip) ret test_divmod4: movl %edi, %eax movl $1717986919, %edx imull %edx movl %edi, %eax sarl $31, %eax sarl $2, %edx subl %eax, %edx movl %edx, b(%rip) leal (%rdx,%rdx,4), %edx addl %edx, %edx subl %edx, %edi movl %edi, a(%rip) ret
Updated by gstrauss almost 9 years ago
- Status changed from Wontfix to Fixed
- Target version set to 1.4.x
optimized in r6afad87d
commit 6afad87d2ed66a48cda2a7c01dbcc59023774b73 Author: Stefan Bühler <stbuehler@web.de> Date: Sun Feb 8 12:37:10 2015 +0000 git-svn-id: svn://svn.lighttpd.net/lighttpd/branches/lighttpd-1.4.x@2975 152afb58-edef-0310-8abb-c4023f1b3aa9
Updated by stbuehler over 8 years ago
- Assignee deleted (
stbuehler) - Target version deleted (
1.4.x)
Also available in: Atom