-
-
Notifications
You must be signed in to change notification settings - Fork 230
Fix #755: ARM benchmarks run without crash or swap #816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
🤖 Welcome! Thanks for your interest in contributing to the project! Here is a short check-list to help you get started: Creating pull request
Code review
Refer to contribution guidelines for futher details. |
|
Hi! Thanks for looking into this, the approach with reducing benchmark parameters looks fine. So far I've tested your branch on 2 boxes, 32-bit RPi3B and 64-bit RPI4B, and I needed a few extra modifications: diff --git a/src/tests/roc_core/bench_mpsc_queue.cpp b/src/tests/roc_core/bench_mpsc_queue.cpp
index 0e8859f5..70b07f72 100644
--- a/src/tests/roc_core/bench_mpsc_queue.cpp
+++ b/src/tests/roc_core/bench_mpsc_queue.cpp
@@ -16,7 +16,7 @@ namespace roc {
namespace core {
namespace {
-enum { BatchSize = 10000, NumIterations = 5000000, NumThreads = 16 };
+enum { BatchSize = 10000, NumIterations = 500000, NumThreads = 4 };
#if defined(ROC_BENCHMARK_USE_ACCESSORS)
inline int get_thread_index(const benchmark::State& state) {
@@ -150,8 +150,6 @@ BENCHMARK_REGISTER_F(BM_MpscQueue, TryPopFront)
->Arg(1)
->Arg(2)
->Arg(4)
- ->Arg(8)
- ->Arg(16)
->Iterations(NumIterations)
->Unit(benchmark::kMicrosecond);
@@ -184,8 +182,6 @@ BENCHMARK_REGISTER_F(BM_MpscQueue, PopFront)
->Arg(1)
->Arg(2)
->Arg(4)
- ->Arg(8)
- ->Arg(16)
->Iterations(NumIterations)
->Unit(benchmark::kMicrosecond);
diff --git a/src/tests/roc_ctl/bench_task_queue_contention.cpp b/src/tests/roc_ctl/bench_task_queue_contention.cpp
index f1f8452d..5438654f 100644
--- a/src/tests/roc_ctl/bench_task_queue_contention.cpp
+++ b/src/tests/roc_ctl/bench_task_queue_contention.cpp
@@ -17,9 +17,9 @@ namespace ctl {
namespace {
enum {
- NumScheduleIterations = 2000000,
- NumScheduleAfterIterations = 20000,
- NumThreads = 8,
+ NumScheduleIterations = 200000,
+ NumScheduleAfterIterations = 5000,
+ NumThreads = 4,
BatchSize = 1000
};
diff --git a/src/tests/roc_pipeline/bench_pipeline_loop_contention.cpp b/src/tests/roc_pipeline/bench_pipeline_loop_contention.cpp
index 57b7be0b..00b32c9f 100644
--- a/src/tests/roc_pipeline/bench_pipeline_loop_contention.cpp
+++ b/src/tests/roc_pipeline/bench_pipeline_loop_contention.cpp
@@ -31,7 +31,7 @@ namespace {
enum {
SampleRate = 1000000, // 1 sample = 1 us (for convenience)
Chans = 0x1,
- NumThreads = 16,
+ NumThreads = 4,
NumIterations = 1000000,
BatchSize = 10000,
FrameBufSize = 100Can you please include those in your PR too? Also, I suggest to make this solution a bit more generic. First, arch-specific ifdefs (like I suggest introducing 3 benchmark profiles: LARGE (for workstation-class computers and larger), MEDIUM (for powerfull SBCs like raspberry pi), and SMALL (for weak CPUs like e.g. MIPS routers). Then in specific benchmarks we can do something like: #ifdef ROC_BENCHMARK_PROFILE_LARGE
NumScheduleIterations = 2000000,
#else
NumScheduleIterations = 200000,
#endifI've pushed a commit that defines Maybe something like this:
Also I think we should allow overwriting selected profile via a build option, like: You can find examples in In addition, currently number of threads in benchmarks is hard-coded, but obviously there is no good universal value. In practice we want We can implement Also, some benchmarks use: insteaf of Let me know what do you think about this ideas and how much time & interest do you have for this. If you wish to implement the above, you're very welcome, otherwise let me know what parts are you interested in and I'll work on the others when I have time. |
|
PS. cross-compiling benchmarks for aarch64 was failing for me, pushed a fix: c4ad8b8 |
|
Thanks! I'll update the PR as soon as I can. Don't want to give exact dates, but I'll aim within a week or two. I'll update with your changes and also add profiles. If time allows I can also add CpuCount(), will update you. |
* Drain pipeline queue before destruction → removes SIGABRT on aarch64. * Reduce ctl contention iterations to 200k on aarch64 on 2 GB Pi.
de4672b to
b3a3d67
Compare
|
Hi there, I implemented all changes except Thread::cpu_count(). I can do it for sure, and would love to, but does this have to be under same PR? Let me know. However, I got some, hopefully, minor issues with passing checks. This is what I did.
Why could reformatting changes this? Or perhaps something else. PS. I will also have a look later, but if you know the answer quicker, that would be great. This far, in actual benchmarks, I split benchmark NumIterations, NumThreads and other parameters based on either LARGE profile, or anything else. I only tested this on raspberry pi and my VM, which both should be considered MEDIUM profile. Do you think I should add different numbers for SMALL profile? If so, could you help me out with numbers there. I don't have any other devices to test this on. |
- Introduce a new SCons option `--bench-profile` to select benchmark profile: small, medium, or large. - SCons now sets the appropriate preprocessor define (`ROC_BENCHMARK_PROFILE_SMALL`, `_MEDIUM`, or `_LARGE`) based on the selected profile. - Update benchmark sources to use these defines to control dataset sizes, thread counts, and iteration counts for various benchmarks. - Default values are chosen if no profile is specified. - Refactor hardcoded benchmark parameters in `bench_mpsc_queue.cpp`, `bench_task_queue_contention.cpp`, and `bench_pipeline_loop_contention.cpp` to use the new profile-based configuration. - Included `bench_profile.h` to select default ROC_BENCHMARK_PROFILE_* for specific arch.
b3a3d67 to
8c18374
Compare
|
Due to previously failed ci check(failed loopback_sink_2_source test), I ran it(ci_checks/linux-x86_64/ubuntu-24.04.sh) locally, and it passed. Then I rerun whole ci checks and now they are passing. So I would assume "not stable" test. Otherwise, this PR is ready for review. Could someone take a look at it @gavv ? |
Fixes #755
All benchmarks pass on aarch64.
Tested on Raspberry Pi 4 with 2GB as well as VM with 3GB – no swap growth.