Benchmarks to compare the performance of async runtimes / executors.
An interactive view of the full results dataset is available at: https://fleetcode.com/runtime-benchmarks/
Results summary table of a single configuration:
| Runtime | libfork | TooManyCooks | tbb | cppcoro | taskflow | coros | HPX | concurrencpp | libcoro |
|---|---|---|---|---|---|---|---|---|---|
| Mean Ratio to Best (lower is better) |
1.00x | 1.18x | 2.77x | 2.85x | 3.43x | 4.31x | 161.07x | 171.82x | 2246.25x |
| skynet | 39909 us | 47183 us | 139988 us | 145840 us | 201392 us | 102525 us | 15548196 us | 12333520 us | 156037584 us |
| nqueens | 80369 us | 82674 us | 165430 us | 183119 us | 258068 us | 863669 us | 3170738 us | 8256568 us | 42238496 us |
| fib(39) | 67544 us | 98931 us | 269527 us | 277267 us | 263881 us | 182708 us | 14420956 us | 18497745 us | 306545929 us |
| matmul(2048) | 41013 us | 42837 us | 62564 us | 55103 us | 63544 us | 50453 us | 71603 us | 66590 us | 456916 us |
Click to view the machine configuration used in the summary table
- Processor: EPYC 7742 64-core processor
- Worker Thread Count: 64 (no SMT)
- OS: Debian 13 Server
- Compiler: Clang 21.1.3 Release (-O3 -march=native)
- CPU boost enabled / schedutil governor
- Linked against libtcmalloc_minimal.so.4
Currently only includes C++ frameworks, and several recursive fork-join benchmarks:
- recursive fibonacci (forks x2)
- skynet (original link) but increased to 100M tasks (forks x10)
- nqueens (forks up to x14)
- matmul (forks x4)
Benchmark problem sizes were chosen to balance between making the total runtime of a full sweep tolerable (especially on weaker hardware with slower runtimes), and being sufficiently large to show meaningful differentiation between faster runtimes.
- The build+bench script uses python3
- CMake + Clang 18 or newer
- libfork and TooManyCooks depend on the hwloc library.
- TBB benchmarks depend on system installed TBB - see the installation guide here for the newest version or you may be able to find the old version 'libtbb-dev' in your system package manager
- A high performance allocator (tcmalloc, jemalloc, or mimalloc) is also recommended. The build script will dynamically link to any of these if they are available.
apt-get install cmake hwloc libhwloc-dev intel-oneapi-tbb-devel libtcmalloc-minimal4
python3 ./build_and_bench_all.py
Results will appear in RESULTS.md and RESULTS.csv files.
python3 ./build_and_bench_all.py full
Results will also appear in RESULTS.json file; this file can be parsed by the interactive benchmarks site. A locally viewable version of this HTML chart will be generated as well.
Frameworks to come:
- (C#) .Net thread pool
- (Rust) tokio
- (Golang) goroutines
- Facebook Folly
- PhotonLibOS https://github.com/alibaba/PhotonLibOS
Benchmarks to come:
- Lots of good inspiration here
