runtime-benchmarks

Benchmarks to compare the performance of async runtimes / executors.

An interactive view of the full results dataset is available at: https://fleetcode.com/runtime-benchmarks/

Results summary table of a single configuration:

Runtime	libfork	TooManyCooks	tbb	cppcoro	taskflow	coros	HPX	concurrencpp	libcoro
Mean Ratio to Best (lower is better)	1.00x	1.18x	2.77x	2.85x	3.43x	4.31x	161.07x	171.82x	2246.25x
skynet	39909 us	47183 us	139988 us	145840 us	201392 us	102525 us	15548196 us	12333520 us	156037584 us
nqueens	80369 us	82674 us	165430 us	183119 us	258068 us	863669 us	3170738 us	8256568 us	42238496 us
fib(39)	67544 us	98931 us	269527 us	277267 us	263881 us	182708 us	14420956 us	18497745 us	306545929 us
matmul(2048)	41013 us	42837 us	62564 us	55103 us	63544 us	50453 us	71603 us	66590 us	456916 us

Click to view the machine configuration used in the summary table

Processor: EPYC 7742 64-core processor
Worker Thread Count: 64 (no SMT)
OS: Debian 13 Server
Compiler: Clang 21.1.3 Release (-O3 -march=native)
CPU boost enabled / schedutil governor
Linked against libtcmalloc_minimal.so.4

What's covered?

Currently only includes C++ frameworks, and several recursive fork-join benchmarks:

recursive fibonacci (forks x2)
skynet (original link) but increased to 100M tasks (forks x10)
nqueens (forks up to x14)
matmul (forks x4)

Benchmark problem sizes were chosen to balance between making the total runtime of a full sweep tolerable (especially on weaker hardware with slower runtimes), and being sufficiently large to show meaningful differentiation between faster runtimes.

How to build and run the benchmarks yourself

Install Dependencies:

The build+bench script uses python3
CMake + Clang 18 or newer
libfork and TooManyCooks depend on the hwloc library.
TBB benchmarks depend on system installed TBB - see the installation guide here for the newest version or you may be able to find the old version 'libtbb-dev' in your system package manager
A high performance allocator (tcmalloc, jemalloc, or mimalloc) is also recommended. The build script will dynamically link to any of these if they are available.

apt-get install cmake hwloc libhwloc-dev intel-oneapi-tbb-devel libtcmalloc-minimal4

Get Quick Results (uses threads = #CPUs):

python3 ./build_and_bench_all.py

Results will appear in RESULTS.md and RESULTS.csv files.

Get Full Results (sweeps threads from 1 to #CPUs):

python3 ./build_and_bench_all.py full

Results will also appear in RESULTS.json file; this file can be parsed by the interactive benchmarks site. A locally viewable version of this HTML chart will be generated as well.

Future Plans

Frameworks to come:

(C#) .Net thread pool
(Rust) tokio
(Golang) goroutines
Facebook Folly
PhotonLibOS https://github.com/alibaba/PhotonLibOS

Benchmarks to come:

Lots of good inspiration here

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
cpp		cpp
.clang-format		.clang-format
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
build_and_bench_all.py		build_and_bench_all.py
clean_all.sh		clean_all.sh
get_nproc.sh		get_nproc.sh
merge_results.py		merge_results.py
results.html.tmpl		results.html.tmpl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

runtime-benchmarks

What's covered?

How to build and run the benchmarks yourself

Install Dependencies:

Get Quick Results (uses threads = #CPUs):

Get Full Results (sweeps threads from 1 to #CPUs):

Future Plans

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

tzcnt/runtime-benchmarks

Folders and files

Latest commit

History

Repository files navigation

runtime-benchmarks

What's covered?

How to build and run the benchmarks yourself

Install Dependencies:

Get Quick Results (uses threads = #CPUs):

Get Full Results (sweeps threads from 1 to #CPUs):

Future Plans

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages