Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

So here are my preliminary benchmarks with my own implementation on an AMD EPYC 9334 32-Core processo. I need to double checks things - so take this with a grain of salt for now. Time is in seconds for 100000 iterations of manorboy(10). So far, the only implementation which clearly sucks is std::function<>. Even trampolines are suprisingly good (but I can imagine that they are much worse on other CPUs / architectures)

  xgcc (GCC) 16.0.0 20260103 (experimental)
  1.50 gcc -ftrampoline-impl=stack -Wl,-no-warn-execstack
  1.11 gcc -ftrampoline-impl=stack -Wl,-no-warn-execstack -DREFARG
  7.21 gcc -ftrampoline-impl=heap
  7.34 gcc -ftrampoline-impl=heap -DREFARG
  0.93 gcc -DWIDEPTR
  1.38 gcc -DWIDEPTR -DREFARG
  1.40 gcc -DDIRECT
  1.05 gcc -xc++ -std=c++26 -DFUNCREF -DDEDUCING
  19.68 gcc -xc++ -std=c++26 -DDEDUCING
  20.73 gcc -xc++ -std=c++26
  6.31 gcc -xc++ -std=c++26 -DDEDUCING -DREFARG
  6.31 gcc -xc++ -std=c++26 -DREFARG
  Debian clang version 16.0.6 (15~deb12u1)
  21.11 clang -xc++
  6.16 clang -xc++ -DREFARG
  1.66 clang -fblocks
  1.70 clang -fblocks -DREFARG



Ah - thanks. I'll have a play with some of my systems, and see what it shows.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: