Project

General

Profile

Feature #2139

Optimizing buffer_append_long()

Added by crypt over 9 years ago. Updated about 3 years ago.

Status:
Fixed
Priority:
Normal
Assignee:
-
Category:
core
Target version:
-
Start date:
2009-12-31
Due date:
% Done:

100%

Estimated time:
Missing in 1.5.x:
No

Description

lighttpd's internal buffer datatype has different functions in order to write data into the char array behind the scenes. One of them is buffer_append_long() which is as the name suggests, a function to append an integer to the buffer by using the internal function LI_ltostr(). The patch changes the calculation a bit to be able to deliver the result faster. I've made quite a few benchmarks for this as I needed a similar function in another program. And this little change improves the performance by a few percent - why should we give it away?

buffer.patch (670 Bytes) buffer.patch crypt, 2009-12-31 14:35

Related issues

Related to Feature #2126: lighttpd patchesDuplicate2009-12-28

Actions

History

#1

Updated by stbuehler over 9 years ago

I had a look at the patch:
I think the compiler should be able to optimize the use of "x = val % 10, y = val / 10" - perhaps it doesn't, but i don't like optimizing it myself (on x86 the "DIV" command calculates both values anyway).
The swap optimization is even uglier - you could make the swap variable local to the loop, but i doubt XOR is faster than swapping with a temporary var (did you actually use "-O2" for benchmarking it?).

#2

Updated by stbuehler over 9 years ago

  • Status changed from New to Wontfix
#3

Updated by stbuehler over 9 years ago

  • Target version deleted (1.4.26)
#4

Updated by crypt over 9 years ago

  • Assignee set to stbuehler

Sorry for my late response. The XOR-swap was just an attempt to remove the temporary variable. I think on most common architectures the way over a third variable should be imperceptibly slower than the way over bit manipulation. Under the few of readability you should leave this part.

But as you may noticed, the slow DIV is called twice even with -O23 and my optimization by rewriting the formula at this point makes the biggest performance benefit of the patch.

Ultimately, it's your decision, and there are surely many other points where divisions could be optimized in lighttpd, but as this function is called many times per request, I thought it could be a little improvement.

#5

Updated by stbuehler over 9 years ago

I don't know who told you that, but it is wrong.

Example (i'm on x86_64):

You will see, that the swap with temporary var is definitely faster; temporary vars are almost always held in registers; and for xor you need to copy them into a register anyway (the compiler adds temporary vars for every calculation step anyway).

In the case with div/mod: if you don't know the divisor, using / and % is faster (only one idiv); if the divisor is 10, the resulting code is pretty much the same (just not really readable, gcc uses some magic foo here :D), so it doesn't matter.

Now, some of these things may behave differently in loops; perhaps the compiler can somehow optimize some variants better than others; but this very much depends on the compiler you are using, and so i prefer to use the clean way, which compilers should be able to optimize way better.

Perhaps you want to read http://dl.fefe.de/optimizer-isec.pdf

void swap1(char *a, char *b) {
    char tmp = *a; *a = *b; *b = tmp;
}

void swap2(char *a, char *b) {
    *a ^= *b ^= *a ^= *b;
}

gcc -S -O3 test_swaps.c

swap1:
    movzbl  (%rdi), %edx
    movzbl  (%rsi), %eax
    movb    %al, (%rdi)
    movb    %dl, (%rsi)
    ret

swap2:
    movzbl  (%rsi), %eax
    movl    %eax, %edx
    xorb    (%rdi), %dl
    xorl    %edx, %eax
    movb    %al, (%rsi)
    xorl    %edx, %eax
    movb    %al, (%rdi)
    ret

And here the results for div+mod:

int a, b;

void test_divmod1(int x, int y) {
    a = x % y;
    b = x / y;
}

void test_divmod2(int x, int y) {
    a = x - (b = x / y) * x;
}

void test_divmod3(int x) {
    a = x % 10;
    b = x / 10;
}

void test_divmod4(int x) {
    a = x - (b = x / 10) * 10;
}
test_divmod1:
    movl    %edi, %edx
    movl    %edi, %eax
    sarl    $31, %edx
    idivl    %esi
    movl    %edx, a(%rip)
    movl    %eax, b(%rip)
    ret

test_divmod2:
    movl    %edi, %edx
    movl    %edi, %eax
    sarl    $31, %edx
    idivl    %esi
    movl    %eax, %ecx
    movl    %eax, b(%rip)
    movl    $1, %eax
    subl    %ecx, %eax
    imull    %edi, %eax
    movl    %eax, a(%rip)
    ret

test_divmod3:
    movl    %edi, %eax
    movl    $1717986919, %edx
    imull    %edx
    movl    %edi, %eax
    sarl    $31, %eax
    sarl    $2, %edx
    subl    %eax, %edx
    leal    (%rdx,%rdx,4), %eax
    movl    %edx, b(%rip)
    addl    %eax, %eax
    subl    %eax, %edi
    movl    %edi, a(%rip)
    ret

test_divmod4:
    movl    %edi, %eax
    movl    $1717986919, %edx
    imull    %edx
    movl    %edi, %eax
    sarl    $31, %eax
    sarl    $2, %edx
    subl    %eax, %edx
    movl    %edx, b(%rip)
    leal    (%rdx,%rdx,4), %edx
    addl    %edx, %edx
    subl    %edx, %edi
    movl    %edi, a(%rip)
    ret
#6

Updated by gstrauss over 3 years ago

  • Status changed from Wontfix to Fixed
  • Target version set to 1.4.x

optimized in r6afad87d

commit  6afad87d2ed66a48cda2a7c01dbcc59023774b73
Author: Stefan B├╝hler <stbuehler@web.de>
Date:   Sun Feb 8 12:37:10 2015 +0000

    git-svn-id: svn://svn.lighttpd.net/lighttpd/branches/lighttpd-1.4.x@2975 152afb58-edef-0310-8abb-c4023f1b3aa9
#7

Updated by stbuehler about 3 years ago

  • Assignee deleted (stbuehler)
  • Target version deleted (1.4.x)

Also available in: Atom