Skip to content

tzcnt/runtime-benchmarks

Repository files navigation

runtime-benchmarks

Benchmarks to compare the performance of async runtimes / executors.

An interactive view of the full results dataset is available at: https://fleetcode.com/runtime-benchmarks/

Results summary table of a single configuration:

Runtime libfork TooManyCooks tbb cppcoro taskflow coros HPX concurrencpp libcoro
Mean Ratio to Best
(lower is better)
1.00x 1.18x 2.77x 2.85x 3.43x 4.31x 161.07x 171.82x 2246.25x
skynet 39909 us 47183 us 139988 us 145840 us 201392 us 102525 us 15548196 us 12333520 us 156037584 us
nqueens 80369 us 82674 us 165430 us 183119 us 258068 us 863669 us 3170738 us 8256568 us 42238496 us
fib(39) 67544 us 98931 us 269527 us 277267 us 263881 us 182708 us 14420956 us 18497745 us 306545929 us
matmul(2048) 41013 us 42837 us 62564 us 55103 us 63544 us 50453 us 71603 us 66590 us 456916 us
Click to view the machine configuration used in the summary table
  • Processor: EPYC 7742 64-core processor
  • Worker Thread Count: 64 (no SMT)
  • OS: Debian 13 Server
  • Compiler: Clang 21.1.3 Release (-O3 -march=native)
  • CPU boost enabled / schedutil governor
  • Linked against libtcmalloc_minimal.so.4

What's covered?

Currently only includes C++ frameworks, and several recursive fork-join benchmarks:

  • recursive fibonacci (forks x2)
  • skynet (original link) but increased to 100M tasks (forks x10)
  • nqueens (forks up to x14)
  • matmul (forks x4)

Benchmark problem sizes were chosen to balance between making the total runtime of a full sweep tolerable (especially on weaker hardware with slower runtimes), and being sufficiently large to show meaningful differentiation between faster runtimes.

How to build and run the benchmarks yourself

Install Dependencies:

  • The build+bench script uses python3
  • CMake + Clang 18 or newer
  • libfork and TooManyCooks depend on the hwloc library.
  • TBB benchmarks depend on system installed TBB - see the installation guide here for the newest version or you may be able to find the old version 'libtbb-dev' in your system package manager
  • A high performance allocator (tcmalloc, jemalloc, or mimalloc) is also recommended. The build script will dynamically link to any of these if they are available.

apt-get install cmake hwloc libhwloc-dev intel-oneapi-tbb-devel libtcmalloc-minimal4

Get Quick Results (uses threads = #CPUs):

python3 ./build_and_bench_all.py

Results will appear in RESULTS.md and RESULTS.csv files.

Get Full Results (sweeps threads from 1 to #CPUs):

python3 ./build_and_bench_all.py full

Results will also appear in RESULTS.json file; this file can be parsed by the interactive benchmarks site. A locally viewable version of this HTML chart will be generated as well.

Future Plans

Frameworks to come:

Benchmarks to come:

  • Lots of good inspiration here

About

Benchmarks to compare the performance of async runtimes / fork-join frameworks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •